[committed] RISC-V: Regen riscv-ext.texi [NFC]
Regenerates the `riscv-ext.texi` file in the GCC documentation. gcc/ChangeLog: * doc/riscv-ext.texi: Regen. --- gcc/doc/riscv-ext.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/riscv-ext.texi b/gcc/doc/riscv-ext.texi index e69a2df768d..c3ed1bfb593 100644 --- a/gcc/doc/riscv-ext.texi +++ b/gcc/doc/riscv-ext.texi @@ -520,7 +520,7 @@ @item smrnmi @tab 1.0 -@tab Resumable Non-Maskable Interrupts +@tab Resumable non-maskable interrupts @item smstateen @tab 1.0 -- 2.34.1
[wwwdocs, committed] gcc-16/changes.html + projects/gomp/: OpenMP/OpenACC update
First real entries for https://gcc.gnu.org/gcc-16/changes.html New API routines for OpenMP and OpenACC Added one supported and one partial to https://gcc.gnu.org/projects/gomp/ (Once a bit more is implemented, the partial one [declare mapper] will be also be added to gcc-16/changes.html and libgomp.texi [and hence onlinedocs/libgomp]. But that's for another day.) Tobias commit 152a09eae9d5852e2f628c3e8b3156bf744e63cf Author: Tobias Burnus Date: Tue Jun 10 06:21:12 2025 +0200 gcc-16/changes.html + projects/gomp/: OpenMP/OpenACC update * gcc-16/changes.html: Add OpenMP + OpenACC sections and mention new API routines. * projects/gomp/: Add first GCC 16 features. --- htdocs/gcc-16/changes.html | 29 + htdocs/projects/gomp/index.html | 6 +++--- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/htdocs/gcc-16/changes.html b/htdocs/gcc-16/changes.html index 9f25ed38..bec41705 100644 --- a/htdocs/gcc-16/changes.html +++ b/htdocs/gcc-16/changes.html @@ -40,6 +40,35 @@ a work-in-progress. New Languages and Language specific improvements +OpenMP + + + See the + GNU Offloading and Multi-Processing Project (GOMP) + page for general information. + + + + OpenMP 6.0: The https://gcc.gnu.org/onlinedocs/libgomp/omp_005ftarget_005fmemset.html";> + omp_target_memset and https://gcc.gnu.org/onlinedocs/libgomp/omp_005ftarget_005fmemset_005fasync.html";> + omp_target_memset_async API routines have been + added. + + +OpenACC + +See the GCC https://gcc.gnu.org/wiki/OpenACC";>OpenACC wiki page +for general information. + + + The https://gcc.gnu.org/onlinedocs/libgomp/acc_005fmemcpy_005fdevice.html";> + acc_memcpy_device and acc_memcpy_device_async + API routines have been added for C, C++ and Fortran. + + diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index 07b872ef..de877bad 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -513,8 +513,8 @@ than listed, depending on resolved corner cases and optimizations. declare mapper directive -No - +GCC 16 +Initial support for C/C++, only OMPT interface @@ -1533,7 +1533,7 @@ error. omp_target_memset and omp_target_memset_async routines -No +GCC 16
Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not
Ping On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > From: "hongtao.liu" > > AutoFDO profile is a scaled profile, as a result, 0 sample does not > mean never executed. especially there's profile from function > body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all > bb->count. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > OK for trunk? > > gcc/ChangeLog: > > PR gcov-profile/118551 > * predict.cc (estimate_bb_frequencies): Prevent > combine_with_ipa_count·(ipa_count) from zeroing all bb->count > when function body is known to be hot. > --- > gcc/predict.cc | 73 +- > 1 file changed, 42 insertions(+), 31 deletions(-) > > diff --git a/gcc/predict.cc b/gcc/predict.cc > index ef31c48bfe2..7d4bf5261ad 100644 > --- a/gcc/predict.cc > +++ b/gcc/predict.cc > @@ -4105,40 +4105,51 @@ estimate_bb_frequencies () >if (freq_max < 16) > freq_max = 16; >profile_count ipa_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa (); > - cfun->cfg->count_max = profile_count::uninitialized (); > - FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) > + > + /* AutoFDO profile is a scaled profile, as a result, 0 sample does not > + mean never executed. especially there's profile from function > + body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all > + bb->count. */ > + if (!(ipa_count.quality () == AFDO > + && cfun->cfg->count_max.quality () == AFDO > + && !ipa_count.nonzero_p () > + && cfun->cfg->count_max.nonzero_p ())) > { > - sreal tmp = BLOCK_INFO (bb)->frequency; > - if (tmp >= 1) > + cfun->cfg->count_max = profile_count::uninitialized (); > + FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) > { > - gimple_stmt_iterator gsi; > - tree decl; > - > - /* Self recursive calls can not have frequency greater than 1 > -or program will never terminate. This will result in an > -inconsistent bb profile but it is better than greatly confusing > -IPA cost metrics. */ > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > - if (is_gimple_call (gsi_stmt (gsi)) > - && (decl = gimple_call_fndecl (gsi_stmt (gsi))) != NULL > - && recursive_call_p (current_function_decl, decl)) > - { > - if (dump_file) > - fprintf (dump_file, "Dropping frequency of recursive call" > - " in bb %i from %f\n", bb->index, > - tmp.to_double ()); > - tmp = (sreal)9 / (sreal)10; > - break; > - } > + sreal tmp = BLOCK_INFO (bb)->frequency; > + if (tmp >= 1) > + { > + gimple_stmt_iterator gsi; > + tree decl; > + > + /* Self recursive calls can not have frequency greater than 1 > +or program will never terminate. This will result in an > +inconsistent bb profile but it is better than greatly > confusing > +IPA cost metrics. */ > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + if (is_gimple_call (gsi_stmt (gsi)) > + && (decl = gimple_call_fndecl (gsi_stmt (gsi))) != NULL > + && recursive_call_p (current_function_decl, decl)) > + { > + if (dump_file) > + fprintf (dump_file, "Dropping frequency of recursive > call" > + " in bb %i from %f\n", bb->index, > + tmp.to_double ()); > + tmp = (sreal)9 / (sreal)10; > + break; > + } > + } > + tmp = tmp * freq_max; > + profile_count count = profile_count::from_gcov_type > (tmp.to_nearest_int ()); > + > + /* If we have profile feedback in which this function was never > +executed, then preserve this info. */ > + if (!(bb->count == profile_count::zero ())) > + bb->count = count.guessed_local ().combine_with_ipa_count > (ipa_count); > + cfun->cfg->count_max = cfun->cfg->count_max.max (bb->count); > } > - tmp = tmp * freq_max; > - profile_count count = profile_count::from_gcov_type > (tmp.to_nearest_int ()); > - > - /* If we have profile feedback in which this function was never > -executed, then preserve this info. */ > - if (!(bb->count == profile_count::zero ())) > - bb->count = count.guessed_local ().combine_with_ipa_count (ipa_count); > - cfun->cfg->count_max = cfun->cfg->count_max.max (bb->count); > } > >free_aux_for_blocks (); > -- > 2.34.1 > -- BR, Hongtao
Re: [PATCH 1/3] c: partial fix for qualifier inconsistency [PR120510]
On Sat, 7 Jun 2025, Martin Uecker wrote: > Checking assertion revealed that we sometimes produce > composite types with incorrect qualifiers, e.g. the example > > int f(int [_Atomic]); > int f(int [_Atomic]); > int f(int [_Atomic]); > > was rejected because atomic was lost in the second declaration. > > PR c/120510 > > gcc/ChangeLog: > * c-typeck.cc (composite_types_internal): Handle arrays > declared with atomic for function arguments. > > gcc/testsuite/ChangeLog: > * gcc.dg/pr120510.c OK. -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH 2/3] c: partial fix for qualifier inconsistency II [PR120510]
On Sat, 7 Jun 2025, Martin Uecker wrote: > This fixes a case where we invoke composite_type with types > that do not have matching qualifiers. With this change, we can > activate the checking assertion that makes sure the composite > type is compatible with the two original types also for arrays. > > PR c/120510 > > gcc/c/ChangeLog: > * c-typeck.c (composite_type_internal): Activate checking > assertion for arrays. > (common_pointer_type): Remove qualifiers also from arrays. OK. -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH] c, c++: Save 8 bytes of memory in lang_type for non-ObjC*
On Mon, 9 Jun 2025, Jakub Jelinek wrote: > Hi! > > For C++26 P2786R13 I'm afraid I'll need 4 new flags on class types > in struct lang_type (1 bit for trivially_relocatable_if_eligible, > 1 for replaceable_if_eligible, 1 for not_trivially_relocatable and > 1 for not_replaceable) and there are just 2 bits left. > > The following patch is an attempt to save 8 bytes of memory > in those structures when not compiling ObjC or ObjC++ (I think those > are used fairly rarely and the patch keeps the sizes unmodified for > those 2). The old allocations were 32 bytes for C and 120 bytes > for C++. The patch moves the objc_info member last in the C++ case > (it was already last in the C case), arranges for GC to skip it > for C and C++ but walk for ObjC and ObjC++ and allocates or > copies over just offsetof bytes instead of sizeof. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2025-06-09 Jakub Jelinek > > gcc/c/ > * c-lang.h (union lang_type::maybe_objc_info): New type. > (struct lang_type): Use union maybe_objc_info info member > instead of tree objc_info. > * c-decl.cc (finish_struct): Allocate struct lang_type using > ggc_internal_cleared_alloc instead of ggc_cleared_alloc, > and use sizeof (struct lang_type) for ObjC and otherwise > offsetof (struct lang_type, info) as size. > (finish_enum): Likewise. The C changes are OK. -- Joseph S. Myers josmy...@redhat.com
Re: [wwwdocs] Add C status page (check, small tweaks)
On Mon, 9 Jun 2025, Marek Polacek wrote: > I've checked our C99 status table against the list in Annex M.5 in C23 > (n3220). > I found no issues. For this it probably makes sense to refer to the latest C2y draft, but there shouldn't be any significant changes to the pre-C23 lists there. (C2y lists in the drafts, however, may well be incomplete as regards changes integrated into C2y up to that point, and also subject to major reworking in future.) > This patch renames the title of our C status page in preparation for adding > C11 and C23 lists. > > W3 validated. Ok? OK. -- Joseph S. Myers josmy...@redhat.com
[PATCH 1/2] libstdc++: Directly implement ranges::heap algos [PR100795]
ranges::push_heap, ranges::pop_heap, ranges::make_heap and ranges::sort_heap are currently defined in terms of the corresponding STL-style algorithms, but this is incorrect because the STL-style algorithms rely on the legacy iterator system, and so misbehave when passed a narrowly C++20 random access iterator. The other ranges heap algos, ranges::is_heap and ranges::is_heap_until, are implemented directly already and have no known issues. This patch reimplements these ranges:: algos directly instead, based closely on the legacy stl_algo.h implementation, with the following changes for compatibility with the C++20 iterator system: - handle non-common ranges by computing the corresponding end iterator - pass and invoke a projection function as necessary - use std::__invoke when invoking the comparator - use ranges::iter_move instead of std::move(*iter) - use iter_value_t / iter_difference_t instead of iterator_traits Besides these changes, the implementation of these algorithms is intended to mirror the stl_algo.h implementations, for ease of maintenance and review. PR libstdc++/100795 libstdc++-v3/ChangeLog: * include/bits/ranges_algo.h (__detail::__push_heap): New, based on the stl_algo.h implementation. (__push_heap_fn::operator()): Reimplement in terms of the above. (__detail::__adjust_heap): New, based on the stl_algo.h implementation. (__deatil::__pop_heap): Likewise. (__pop_heap_fn::operator()): Reimplement in terms of the above. (__make_heap_fn::operator()): Likewise. (__sort_heap_fn::operator()): Likewise. * testsuite/25_algorithms/heap/constrained.cc (test03): New test. --- libstdc++-v3/include/bits/ranges_algo.h | 138 -- .../25_algorithms/heap/constrained.cc | 46 ++ 2 files changed, 168 insertions(+), 16 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h index a62c3cd3954d..f84e0c42d76a 100644 --- a/libstdc++-v3/include/bits/ranges_algo.h +++ b/libstdc++-v3/include/bits/ranges_algo.h @@ -1881,6 +1881,30 @@ namespace ranges inline constexpr __shuffle_fn shuffle{}; + namespace __detail + { +template + constexpr void + __push_heap(_Iter __first, + iter_difference_t<_Iter> __holeIndex, + iter_difference_t<_Iter> __topIndex, + iter_value_t<_Iter> __value, + _Comp __comp, _Proj __proj) + { + auto __parent = (__holeIndex - 1) / 2; + while (__holeIndex > __topIndex + && std::__invoke(__comp, + std::__invoke(__proj, *(__first + __parent)), + std::__invoke(__proj, __value))) + { + *(__first + __holeIndex) = ranges::iter_move(__first + __parent); + __holeIndex = __parent; + __parent = (__holeIndex - 1) / 2; + } + *(__first + __holeIndex) = std::move(__value); + } + } // namespace __detail + struct __push_heap_fn { template _Sent, @@ -1890,10 +1914,16 @@ namespace ranges operator()(_Iter __first, _Sent __last, _Comp __comp = {}, _Proj __proj = {}) const { - auto __lasti = ranges::next(__first, __last); - std::push_heap(__first, __lasti, - __detail::__make_comp_proj(__comp, __proj)); - return __lasti; + if constexpr (!same_as<_Iter, _Sent>) + return (*this)(__first, ranges::next(__first, __last), +std::move(__comp), std::move(__proj)); + else + { + __detail::__push_heap(__first, (__last - __first) - 1, + 0, ranges::iter_move(__last - 1), + __comp, __proj); + return __last; + } } template + constexpr void + __adjust_heap(_Iter __first, + iter_difference_t<_Iter> __holeIndex, + iter_difference_t<_Iter> __len, + iter_value_t<_Iter> __value, + _Comp __comp, _Proj __proj) + { + auto __topIndex = __holeIndex; + auto __secondChild = __holeIndex; + while (__secondChild < (__len - 1) / 2) + { + __secondChild = 2 * (__secondChild + 1); + if (std::__invoke(__comp, + std::__invoke(__proj, *(__first + __secondChild)), + std::__invoke(__proj, *(__first + (__secondChild - 1) + __secondChild--; + *(__first + __holeIndex) = ranges::iter_move(__first + __secondChild); + __holeIndex = __secondChild; + } + if ((__len & 1) == 0 && __secondChild == (__len - 2) / 2) + { + __secondChild = 2 * (__secondChild + 1); + *(__first + __holeIndex) = ranges::iter_move(__first + (__secondChild - 1)); +
[PATCH] libstdc++: Make __max_size_type and __max_diff_type structural
Tested on x86_64-pc-linux-gnu, does this look OK for trunk? -- >8 -- This patch makes these integer-class types structural types by public-izing their data memberss so that they could be used as NTTP types. I don't think the standard requires this, but it seems like a useful extension. libstdc++-v3/ChangeLog: * include/bits/max_size_type.h (__max_size_type::_M_val): Make public instead of private. (__max_size_type::_M_msb): Likewise. (__max_diff_type::_M_rep): Likewise. * testsuite/std/ranges/iota/max_size_type.cc: Verify __max_diff_type and __max_size_type are structural. --- libstdc++-v3/include/bits/max_size_type.h | 4 ++-- libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc | 7 +++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/max_size_type.h b/libstdc++-v3/include/bits/max_size_type.h index 5bec0b5a519a..e602b1b4bee5 100644 --- a/libstdc++-v3/include/bits/max_size_type.h +++ b/libstdc++-v3/include/bits/max_size_type.h @@ -425,10 +425,11 @@ namespace ranges using __rep = unsigned long long; #endif static constexpr size_t _S_rep_bits = sizeof(__rep) * __CHAR_BIT__; -private: + __rep _M_val = 0; unsigned _M_msb:1 = 0; +private: constexpr explicit __max_size_type(__rep __val, int __msb) noexcept : _M_val(__val), _M_msb(__msb) @@ -752,7 +753,6 @@ namespace ranges { return !(__l < __r); } #endif -private: __max_size_type _M_rep = 0; friend class __max_size_type; diff --git a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc index 3e6f954ceb0c..4739d9e2f790 100644 --- a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc +++ b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc @@ -400,6 +400,13 @@ static_assert(max_diff_t(max_size_t(1) << (numeric_limits::digits-1)) == numeric_limits::min()); +// Verify that the types are structural types and can therefore be used +// as NTTP types. +template struct Su { static_assert(V*V == V+132); }; +template struct Ss { static_assert(V*V == V+132); }; +template struct Su<12>; +template struct Ss<12>; + int main() { -- 2.50.0.rc1.89.g8db3019401
[PATCH] driver: Try to read spec from gcc_exec_prefix if possible
GCC will try to read the spec file from the directory where it is installed, but it should try to read from gcc_exec_prefix rather than standard_exec_prefix, because the latter is not the right one if compiler has been relocated into other places other than the path specfied at configuration time. gcc/ChangeLog: * gcc.cc (driver::set_up_specs): Use gcc_exec_prefix to read the spec file rather than standard_exec_prefix. --- gcc/gcc.cc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/gcc.cc b/gcc/gcc.cc index 4e61de2a47c..235fe801988 100644 --- a/gcc/gcc.cc +++ b/gcc/gcc.cc @@ -8506,11 +8506,13 @@ driver::set_up_specs () const spec_machine_suffix = just_machine_suffix; #endif + const char *exec_prefix += gcc_exec_prefix ? gcc_exec_prefix : standard_exec_prefix; /* We need to check standard_exec_prefix/spec_machine_suffix/specs for any override of as, ld and libraries. */ - specs_file = (char *) alloca (strlen (standard_exec_prefix) - + strlen (spec_machine_suffix) + sizeof ("specs")); - strcpy (specs_file, standard_exec_prefix); + specs_file = (char *) alloca ( +strlen (exec_prefix) + strlen (spec_machine_suffix) + sizeof ("specs")); + strcpy (specs_file, exec_prefix); strcat (specs_file, spec_machine_suffix); strcat (specs_file, "specs"); if (access (specs_file, R_OK) == 0) -- 2.34.1
[to-be-committed][RISC-V] Fix ICE due to splitter emitting constant loads directly
This is a fix for a bug found internally in Ventana using the cf3 testsuite. cf3 looks to be dead as a project and likely subsumed by modern fuzzers. In fact internally we tripped another issue with cf3 that had already been reported by Edwin with the fuzzer he runs. Anyway, the splitter in question blindly emits the 2nd adjusted constant into a register, that's not valid if the constant requires any kind of synthesis -- and it well could since we're mostly focused on the first constant turning into something that can be loaded via LUI without increasing the cost of the second constant. Instead of using the split RTL template, this just emits the code we want directly, using riscv_move_insn to synthesize the constant into the provided temporary register. Tested in my system. Waiting on upstream CI's verdict before moving forward. jeffgcc/ * config/riscv/riscv.md (lui-constraintand_to_or): Do not use the RTL template for split code. Emit it directly taking care to avoid emitting a constant load that needed synthesis. Fix formatting. gcc/testsuite/ * gcc.target/riscv/ventana-16122.c: New test. diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6d3c80a04c7..3aed25c2588 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -884,7 +884,7 @@ (define_insn "*addsi3_extended2" ;; Where C1 is not a LUI operand, but ~C1 is a LUI operand (define_insn_and_split "*lui_constraint_and_to_or" - [(set (match_operand:X 0 "register_operand" "=r") + [(set (match_operand:X 0 "register_operand" "=r") (plus:X (and:X (match_operand:X 1 "register_operand" "r") (match_operand 2 "const_int_operand")) (match_operand 3 "const_int_operand"))) @@ -898,13 +898,21 @@ (define_insn_and_split "*lui_constraint_and_to_or" <= riscv_const_insns (operands[3], false)))" "#" "&& reload_completed" - [(set (match_dup 4) (match_dup 5)) - (set (match_dup 0) (ior:X (match_dup 1) (match_dup 4))) - (set (match_dup 4) (match_dup 6)) - (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))] + [(const_int 0)] { operands[5] = GEN_INT (~INTVAL (operands[2])); operands[6] = GEN_INT ((~INTVAL (operands[2])) | (-INTVAL (operands[3]))); + +/* This is always a LUI operand, so it's safe to just emit. */ +emit_move_insn (operands[4], operands[5]); + +rtx x = gen_rtx_IOR (word_mode, operands[1], operands[4]); +emit_move_insn (operands[0], x); + +/* This may require multiple steps to synthesize. */ +riscv_emit_move (operands[4], operands[6]); +x = gen_rtx_MINUS (word_mode, operands[0], operands[4]); +emit_move_insn (operands[0], x); } [(set_attr "type" "arith")]) diff --git a/gcc/testsuite/gcc.target/riscv/ventana-16122.c b/gcc/testsuite/gcc.target/riscv/ventana-16122.c new file mode 100644 index 000..59e6467b57c --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/ventana-16122.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target { rv64 } } } */ + +extern void NG (void); +typedef signed char int8_t; +typedef signed short int16_t; +typedef signed int int32_t; +void f74(void) { + int16_t x309 = 0x7fff; + volatile int32_t x310 = 0x7fff; + int8_t x311 = 59; + int16_t x312 = -0x8000; + static volatile int32_t t74 = 614992577; + +t74 = (x309==((x310^x311)%x312)); + +if (t74 != 0) { NG(); } else { ; } + +} +
Re: [PATCH] forwprop: Change optimize_agr_copyprop into forward walk instead of backwards
On Mon, Jun 9, 2025 at 2:49 AM Richard Biener wrote: > > On Sun, Jun 8, 2025 at 7:52 PM Andrew Pinski wrote: > > > > While thinking about how to implement the rest of the copy prop and makes > > sure not > > to introduce some compile time problems, optimize_agr_copyprop should be > > changed > > into a forwproping rather than looking backwards. > > Can you explain the issues when doing the backwards looking? There is no direct problems with backwards looking. The problem comes if we want to say prop into a function call the original of the copy say: ``` a = b; f(a); ``` Backwards looking means you need to look back for all aggregate function call arguments. So you waste time always looking back. Instead of looking forwards only when there is a copy. we could also even do some proping like for: ``` a = LC0; _t = a[i]; ``` Which sometimes shows up. Here it does not matter if we load from LC0 or a for _t for most cores; though there might be a store forward penalty in some cases. Doing this prop even without fully removing the copy is better in the end. > > Note since this is a pure local "propagation" I still wonder whether > (now in the forward direction) > we want to use single_imm_use () instead of iterating over all uses of > the VDEF. Basically > we want to keep the optimization as "local" as possible. So for vops, single_imm_use is not true if we have say: ``` t = a; _2 = *b; c = t; ``` or: ``` t = a; if (c) goto BB2; else BB3; BB2: c = t; ... BB3: c = t; ``` > > I also wondered about > > b = a; > c = b; > d = b; > > we'll transform c = b into c = a and that's it. > > In the end I still hope we can leverage more global analysis, like > that of SRA, to decide > whether propagation is profitable or not, thus whether we can elide a > temporary. In > that regard, shouldn't we avoid the propagation if 'DEST' has its > address taken given > we certainly cannot elide it? I am not even sure it matters on avoiding the propagation because for large aggregates you will be getting an memcpy and it might be faster not from the values which were just stored to. In the case of small aggregates, the memcpy will be inlined using load/stores and the RTL level will get rid of the loads that just happen and you end up with one set of loads followed by 2 sets of stores. We are also removing some of the store forward penalty in many cases even if we can't elide the copy. Thanks, Andrew > > Thanks, > Richard. > > > Bootstrapped and tested on x86_64-linux-gnu. > > > > gcc/ChangeLog: > > > > * tree-ssa-forwprop.cc (optimize_agr_copyprop): Change into a > > forward looking (looking at vdef's uses) instead of a back > > looking (vuse's def). > > > > Signed-off-by: Andrew Pinski > > --- > > gcc/tree-ssa-forwprop.cc | 121 +++ > > 1 file changed, 60 insertions(+), 61 deletions(-) > > > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > index 43b1c9d696f..2dc77ccba1d 100644 > > --- a/gcc/tree-ssa-forwprop.cc > > +++ b/gcc/tree-ssa-forwprop.cc > > @@ -1344,12 +1344,12 @@ optimize_memcpy_to_memset (gimple_stmt_iterator > > *gsip, tree dest, tree src, tree > >return true; > > } > > /* Optimizes > > - a = c; > > - b = a; > > + DEST = SRC; > > + DEST2 = DEST; # DEST2 = SRC2; > > into > > - a = c; > > - b = c; > > - GSIP is the second statement and SRC is the common > > + DEST = SRC; > > + DEST2 = SRC; > > + GSIP is the first statement and SRC is the common > > between the statements. > > */ > > static bool > > @@ -1365,65 +1365,64 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) > >if (operand_equal_p (dest, src, 0)) > > return false; > > > > - tree vuse = gimple_vuse (stmt); > > - /* If the vuse is the default definition, then there is no store > > beforehand. */ > > - if (SSA_NAME_IS_DEFAULT_DEF (vuse)) > > -return false; > > - gimple *defstmt = SSA_NAME_DEF_STMT (vuse); > > - if (!gimple_assign_load_p (defstmt) > > - || !gimple_store_p (defstmt)) > > -return false; > > - if (gimple_has_volatile_ops (defstmt)) > > -return false; > > - > > - tree dest2 = gimple_assign_lhs (defstmt); > > - tree src2 = gimple_assign_rhs1 (defstmt); > > - > > - /* If the original store is `src2 = src2;` skip over it. */ > > - if (operand_equal_p (src2, dest2, 0)) > > -return false; > > - if (!operand_equal_p (src, dest2, 0)) > > -return false; > > - > > - > > - /* For 2 memory refences and using a temporary to do the copy, > > - don't remove the temporary as the 2 memory references might overlap. > > - Note t does not need to be decl as it could be field. > > - See PR 22237 for full details. > > - E.g. > > - t = *a; > > - *b = t; > > - Cannot be convert into > > - t = *a; > > - *b = *a; > > - Though the following is allowed to be done: > > - t = *a; > > - *a = t; > > - And convert it into: > > - t = *a; > > -
Re: [RFC PATCH v2] cselib: Reuse VALUEs on reg adjustments
On 12/3/24 9:57 PM, Bohan Lei wrote: This is v2 of the patch in https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669380.html. I missed the ChangeLog entry in that version. The commit 2c0fa3ecf70d199af18785702e9e0548fd3ab793 reuses VALUEs on sp adjustments. We can generalize the idea and reuse VALUEs on other registers. This can help the postreload pass find more opportunities to simplify insns. The following assembly code is generated from the testcase using the current trunk compiler: .L5: movq%rbp, %rsi movq%rbx, %rdi calll movq%rbx, %rsi addq$4, %rbx testl %eax, %eax je .L6 leaq-4(%rbx), %rax cmpq%rax, %rbp je .L6 movq%rbx, %rdi callas The lea instruction is actually redundant here, as rsi contains the same value as rbx-4 and can be used in the cmp instruction instead of rax. With this patch, the lea instruction can be eliminated in the postreload pass. Bootstrapped and regtested on x86_64-pc-linux-gnu, no regressions. gcc/ChangeLog: * cselib.cc (REG_DERIVED_VALUE_P): New macro. (cselib_hasher::equal): Use REG_DERIVED_VALUE_P in place of SP_DERIVED_VALUE_P. (autoinc_split): Ditto. (rtx_equal_for_cselib_1): Ditto. (cselib_hash_plus_const_int): Ditto. (cselib_subst_to_values): Ditto. (cselib_lookup_1): Set REG_DERIVED_VALUE_P for newly created VALUEs for registers. Use REG_DERIVED_VALUE_P in place of SP_DERIVED_VALUE_P for PRESERVED_VALUE_P logic. * rtl.h (struct GTY): Mention that the volatil bit is used as REG_DERIVED_VALUE_P in cselib.cc. gcc/testsuite/ChangeLog: * gcc.target/i386/cselib-1.c: New test. --- gcc/cselib.cc| 29 ++-- gcc/rtl.h| 1 + gcc/testsuite/gcc.target/i386/cselib-1.c | 22 ++ 3 files changed, 40 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/cselib-1.c diff --git a/gcc/cselib.cc b/gcc/cselib.cc index e6a36e892bb..43ab266c4de 100644 --- a/gcc/cselib.cc +++ b/gcc/cselib.cc @@ -70,6 +70,9 @@ static rtx autoinc_split (rtx, rtx *, machine_mode); #define SP_DERIVED_VALUE_P(RTX) \ (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call) +#define REG_DERIVED_VALUE_P(RTX) \ + (RTL_FLAG_CHECK1 ("REG_DERIVED_VALUE_P", (RTX), VALUE)->volatil) + Are there any cases where SP_DERIVED_VALUE_P is still used after your proposed change? If not, then we probably should drop SP_DERIVED_VALUE_P. Does this potentially impact debug info? That was one of the big issues Jakub spent time on with the original patch. That's my only conceptual concern. Implementation wise it seems pretty straightforward. Jeff
[PATCH] gcc: Make int n_infiles local to gcc.cc.
In the course of stamping out cppcheck warnings, we ran across a complaint about a "shadowed variable." It turns out that a variable declared in gcc/gcc/h as "extern int n_infiles;" is used only locally in gcc/gcc/cc. This change makes that variable "static int n_infiles;" in gcc/gcc.cc. Okay for trunk? >From 1d9afbdb9c313fd58fefc5d1d284f1831942ed98 Mon Sep 17 00:00:00 2001 From: Robert Dubner mailto:rdub...@symas.com Date: Mon, 9 Jun 2025 18:38:10 -0400 Subject: [PATCH] gcc: Make int n_infiles local to gcc.cc. The variable "int n_infiles" was declared as "extern int n_infiles" although it was used only in gcc.h. This patch makes it "static int n_infiles". gcc/ChangeLog: * gcc.cc: Change "int n_infiles" to "static int n_infiles". * gcc.h: Delete "extern int n_infiles". --- gcc/gcc.cc | 2 +- gcc/gcc.h | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/gcc.cc b/gcc/gcc.cc index 4e61de2a47c..1a6cfc6056f 100644 --- a/gcc/gcc.cc +++ b/gcc/gcc.cc @@ -3651,7 +3651,7 @@ struct infile static struct infile *infiles; -int n_infiles; +static int n_infiles; static int n_infiles_alloc; diff --git a/gcc/gcc.h b/gcc/gcc.h index 5ba6c9d33dd..15ae21eea56 100644 --- a/gcc/gcc.h +++ b/gcc/gcc.h @@ -83,8 +83,6 @@ extern void lang_specific_driver (struct cl_decoded_option **, /* Called before linking. Returns 0 on success and -1 on failure. */ extern int lang_specific_pre_link (void); -extern int n_infiles; - /* Number of extra output files that lang_specific_pre_link may generate. */ extern int lang_specific_extra_outfiles; -- 2.34.1
Re: [PATCH v5 11/24] Add clone_identifier function.
On 5/29/25 6:46 AM, Alfie Richards wrote: This is similar to clone_function_name and its siblings but takes an identifier tree node rather than a function declaration. This is to be used in conjunction with the identifier node stored in cgraph_function_version_info::assembler_name to mangle FMV functions in later patches. gcc/ChangeLog: * cgraph.h (clone_identifier): New function. * cgraphclones.cc (clone_identifier): New function. clone_function_name: Refactored to use clone_identifier. OK jeff
Re: [PATCH v5 06/24] Refactor record_function_versions.
On 5/29/25 6:46 AM, Alfie Richards wrote: Renames record_function_versions to add_function_version, and make it explicit that it is adding a single version to the function structure. Additionally, change the insertion point to always maintain priority ordering of the versions. This allows for removing logic for moving the default to the first position which was duplicated across target specific code and enables easier reasoning about function sets. gcc/ChangeLog: * cgraph.cc (cgraph_node::record_function_versions): Refactor and rename to... (cgraph_node::add_function_version): new function. * cgraph.h (cgraph_node::record_function_versions): Refactor and rename to... (cgraph_node::add_function_version): new function. * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher): Remove reordering. * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): Remove reordering. * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher): Remove reordering. * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher): Remove reordering. gcc/cp/ChangeLog: * decl.cc (maybe_version_functions): Change record_function_versions call to add_function_version. OK. It wouldn't be bad to go ahead and get this in since that will give us at least some testing independent of the rest of the kit. But that's your call IMHO. jeff
Re: [PATCH] RISC-V: Add 'bclr+binv' peephole2 optimization.
On 6/5/25 6:25 AM, Jiawei wrote: Thanks for your suggestions, I found that the ior is successful generated in combine pass without using the zbs extension, and in other architecture it also work fine. Yea, it's probably an artifact of how we need to represent the single bit clear. It's represented as an AND of a ROTATE. So we need simplify-rtx to handle this: (set (reg:DI 154) (xor:DI (and:DI (rotate:DI (const_int -2 [0xfffe]) (subreg:QI (reg:DI 156 [ b ]) 0)) (reg:DI 149 [ *a_10(D) ])) (ashift:DI (const_int 1 [0x1]) (subreg:QI (reg:DI 156 [ b ]) 0 Note how the and-rotate and the ashift of (const_int 1) both use the same rotate/shift count (reg:DI 156). That's the key. The and-rotate clears a bit and thus the xor must be setting the bit and that whole mess can simplify to ior based bit insertion. jeff
Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
On Mon, Jun 09, 2025 at 12:21:16PM -0600, Jeff Law wrote: > > > On 6/9/25 9:43 AM, Stafford Horne wrote: > > > > > Hi, > > > > I do not have a fix for this yet. I feel using or1k_hard_regno_mode_ok to > > control allowing paradoxical subreging of openriscs sr_f special register > > is not > > right. It seems we would want to have something like > > or1k_regno_paradoxical_subreg_mode_ok. > > > > OpenRISC seems pretty unique in how we use (reg:BI 34 ?sr_f) for > > representing the > > condition register. The SR[F] bit is a flag bit in the supervisor register > > that > > represents if the last comparison was true (flag set) or false (flag unset). > Not sure it's that radically different than various other targets. It's > just a bit in a register. The question becomes how much to expose to GCC > and how to expose it. Reminds me a bit of the "T" register on the sh port. Right, I think the point that is unique is that the SR[F] bit register is exposed to GCC as a BI mode register. Thanks for the tip on the SH T register, it does look very similar. Though, it is exposed to GCC as an SI register. Maybe that approach is better. -Stafford
Re: [PATCH 3/3] c: Add remove_qualifier helper function [PR120510]
On Sat, 7 Jun 2025, Martin Uecker wrote: > Add a helper function to replace repeated code for removing > qualifiers but not atomic. Make sure to also remove qualifiers > but not atomic on the element type of arrays. > > PR c/120510 > > gcc/c/ChangeLog: > * c-typeck.c (remove_qualifiers): New function. > (composite_type_internal): Use it. > (comp_target_types): Use it. > (type_lists_compatible_p): Use it. > (find_anonymous_field_with_type): Use it. > (convert_to_anonymous_field): Use it. > (convert_for_assignment): Use it. OK. I think the changes regarding arrays in type_lists_compatible_p are OK because array parameters aren't possible, and in convert_for_assignment are OK because there's other code to handle pre-C23 compatibility rules, but in general it can't be assumed that removing qualifiers on arrays will be correct in all contexts; it needs to be considered case by case what removal of qualifiers should do with array types. -- Joseph S. Myers josmy...@redhat.com
[PATCH 2/2] libstdc++: Directly implement ranges::sort [PR100795]
As with the previous patch, this patch reimplements ranges::sort directly instead of incorrectly forwarding to std::sort. In addition to the compatibility changes listed in the previous patch we also: - use ranges::iter_swap instead of std::iter_swap - use ranges::move_backward instead of std::move_backward - use __bit_floor and __to_unsigned_like instead of __lg We also need to extend std::__lg to support integer-class types. PR libstdc++/100795 PR libstdc++/118209 libstdc++-v3/ChangeLog: * include/bits/ranges_algo.h (__detail::__move_median_to_first): New, based on the stl_algo.h implementation. (__detail::__unguarded_liner_insert): Likewise. (__detail::__insertion_sort): Likewise. (__detail::__sort_threshold): Likewise. (__detail::__unguarded_insertion_sort): Likewise. (__detail::__final_insertion_sort): Likewise. (__detail::__unguarded_partition): Likewise. (__detail::__unguarded_partition_pivot): Likewise. (__detail::__heap_select): Likewise. (__detail::__partial_sort): Likewise. (__detail::__introsort_loop): Likewise. (__sort_fn::operator()): Reimplement in terms of the above. * libstdc++-v3/testsuite/25_algorithms/sort/118209.cc: New test. * libstdc++-v3/testsuite/25_algorithms/sort/constrained.cc (test03): New test. --- libstdc++-v3/include/bits/ranges_algo.h | 210 +- .../testsuite/25_algorithms/sort/118209.cc| 23 ++ .../25_algorithms/sort/constrained.cc | 31 +++ 3 files changed, 260 insertions(+), 4 deletions(-) create mode 100644 libstdc++-v3/testsuite/25_algorithms/sort/118209.cc diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h index f84e0c42d76a..67bdf4287241 100644 --- a/libstdc++-v3/include/bits/ranges_algo.h +++ b/libstdc++-v3/include/bits/ranges_algo.h @@ -32,6 +32,7 @@ #if __cplusplus > 201703L +#include // __bit_floor #if __cplusplus > 202002L #include #endif @@ -2166,6 +2167,184 @@ namespace ranges inline constexpr __is_heap_fn is_heap{}; + namespace __detail + { +template + constexpr void + __move_median_to_first(_Iter __result, _Iter __a, _Iter __b, _Iter __c, +_Comp __comp, _Proj __proj) + { + if (std::__invoke(__comp, + std::__invoke(__proj, *__a), + std::__invoke(__proj, *__b))) + { + if (std::__invoke(__comp, + std::__invoke(__proj, *__b), + std::__invoke(__proj, *__c))) + ranges::iter_swap(__result, __b); + else if (std::__invoke(__comp, + std::__invoke(__proj, *__a), + std::__invoke(__proj, *__c))) + ranges::iter_swap(__result, __c); + else + ranges::iter_swap(__result, __a); + } + else if (std::__invoke(__comp, + std::__invoke(__proj, *__a), + std::__invoke(__proj, *__c))) + ranges::iter_swap(__result, __a); + else if (std::__invoke(__comp, + std::__invoke(__proj, *__b), + std::__invoke(__proj, *__c))) + ranges::iter_swap(__result, __c); + else + ranges::iter_swap(__result, __b); + } + +template + constexpr void + __unguarded_linear_insert(_Iter __last, _Comp __comp, _Proj __proj) + { + iter_value_t<_Iter> __val = ranges::iter_move(__last); + _Iter __next = __last; + --__next; + while (std::__invoke(__comp, +std::__invoke(__proj, __val), +std::__invoke(__proj, *__next))) + { + *__last = ranges::iter_move(__next); + __last = __next; + --__next; + } + *__last = std::move(__val); + } + +template + constexpr void + __insertion_sort(_Iter __first, _Iter __last, _Comp __comp, _Proj __proj) + { + if (__first == __last) return; + + for (_Iter __i = __first + 1; __i != __last; ++__i) + { + if (std::__invoke(__comp, + std::__invoke(__proj, *__i), + std::__invoke(__proj, *__first))) + { + iter_value_t<_Iter> __val = ranges::iter_move(__i); + ranges::move_backward(__first, __i, __i + 1); + *__first = std::move(__val); + } + else + __detail::__unguarded_linear_insert(__i, __comp, __proj); + } + } + +template + constexpr void + __unguarded_insertion_sort(_Iter __first, _Iter __last, +_Comp __comp, _Proj __proj) + { + for (_Iter __i = __first; __i != __last; ++__i)
[PATCH] c, c++: Fix unused result for empty types [PR82134]
Hi, This fixes PR c/82134 which concerns gcc emitting an incorrect unused result diagnostic for empty types. This diagnostic is emitted from tree-cfg.cc because of a couple code paths which attempt to avoid copying empty types, resulting in GIMPLE that isn't using the returned value of a call. To fix this I've added suppress_warning in three locations and a corresponding check in do_warn_unused_result. Cheers, Jeremy PR c/82134 gcc/cp/ChangeLog: * call.cc (build_call_a): Add suppress_warning * cp-gimplify.cc (cp_gimplify_expr): Add suppress_warning gcc/ChangeLog: * gimplify.cc (gimplify_modify_expr): Add suppress_warning * tree-cfg.cc (do_warn_unused_result): Check warning_suppressed_p gcc/testsuite/ChangeLog: * c-c++-common/attr-warn-unused-result-2.c: New test. Signed-off-by: Jeremy Rifkin --- gcc/cp/call.cc| 1 + gcc/cp/cp-gimplify.cc | 1 + gcc/gimplify.cc | 1 + .../c-c++-common/attr-warn-unused-result-2.c | 15 +++ gcc/tree-cfg.cc | 2 ++ 5 files changed, 20 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 2c3ef3dfc35..a70fc13c6a4 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -412,6 +412,7 @@ build_call_a (tree function, int n, tree *argarray) /* We're disconnecting the initializer from its target, don't create a temporary. */ arg = TARGET_EXPR_INITIAL (arg); +suppress_warning (arg, OPT_Wunused_result); tree t = build0 (EMPTY_CLASS_EXPR, TREE_TYPE (arg)); arg = build2 (COMPOUND_EXPR, TREE_TYPE (t), arg, t); CALL_EXPR_ARG (function, i) = arg; diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index 0fcfa16d2c5..2a21e960994 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -690,6 +690,7 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p) && (REFERENCE_CLASS_P (op1) || DECL_P (op1))) op1 = build_fold_addr_expr (op1); +suppress_warning (op1, OPT_Wunused_result); gimplify_and_add (op1, pre_p); } gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 9f9ff92d064..fa9890e7cea 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -7305,6 +7305,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p)) && TREE_CODE (*from_p) == CALL_EXPR)) { + suppress_warning (*from_p, OPT_Wunused_result); gimplify_stmt (from_p, pre_p); gimplify_stmt (to_p, pre_p); *expr_p = NULL_TREE; diff --git a/gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c b/gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c new file mode 100644 index 000..09be1a933c9 --- /dev/null +++ b/gcc/testsuite/c-c++-common/attr-warn-unused-result-2.c @@ -0,0 +1,15 @@ +// PR c/82134 +/* { dg-do compile } */ + +struct S {}; + +__attribute__((warn_unused_result)) struct S foo(); + +void use_s(struct S); + +void +test (void) +{ + struct S s = foo(); /* { dg-bogus "ignoring return value of" } */ + use_s(foo()); /* { dg-bogus "ignoring return value of" } */ +} diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index fad308e7f7b..5cd72ed1b3e 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -9961,6 +9961,8 @@ do_warn_unused_result (gimple_seq seq) break; if (gimple_call_internal_p (g)) break; + if (warning_suppressed_p (g, OPT_Wunused_result)) +break; /* This is a naked call, as opposed to a GIMPLE_CALL with an LHS. All calls whose value is ignored should be -- 2.43.0
Re: [PATCH v2] libstdc++: implement Philox Engine [PR119794]
Hi, On Thu, 22 May 2025, 1nfocalypse wrote: > Implements Philox Engine (P2075R6) and associated tests. > > v2 corrects a multiline comment left in error in serialize.cc, and > additionally corrects a bug hidden by said comment, where the stream was > given the output of 'y()' instead of 'y', causing state to be > incorrectly passed. Lastly, it fixes numerous whitespace issues found in the > original patch. My apologies for not noticing prior to the submission of the > original patch, which can now be disregarded. > > To reiterate from the original email, the template unpacking functions are > placed in a private classifier prior to the public one due to an ordering > bug, where in order to function correctly, they must be > placed prior to the bulk of the header. This is counter to the style > recommendations, but I was unable to obtain functionality any other way. > Additionally, while SIMD instructions are not utilized, and I do > not think that they would integrate well with how the generator's state is > currently handled, some structure choices could be made that may make them of > interest. > > Lastly, since word width can be specified, and thus atypical, maximum value > is calculated via some bit manipulation rather than numeric_limits, since the > specified width may differ from the width of the type > used. Thanks for the nice patch! Just to confirm, do you have a copyright assignment for GCC in place (or are covered by a corporate assignment)? If not, please complete that process, or contribute under the DCO terms, see https://gcc.gnu.org/contribute.html#legal Please don't put copyright and license headers on new tests at all, unless they really are copyright FSF, and really do contain something original and copyrightable (which I don't think applies here). This is documented at https://gcc.gnu.org/onlinedocs/libstdc++/manual/test.html#test.new_tests If the patch is being contributed under the DCO, please resubmit it with a Signed-off-by tag as per the first link :) We hope to do a full review of the patch soon. > > Built/tested on x86_64-linux-gnu. > > * 1nfocalypse > >
Re: [PATCH] doc: allow extend.texi to be processed by makeinfo 4.13
On 06.06.2025 17:28, Sandra Loosemore wrote: > On 6/6/25 00:44, Jan Beulich wrote: >> As per documentation, even 4.7 ought to suffice. At least 4.13 objects >> to there being nothing ahead of the first comma in @xref{}. >> --- >> The text inserted it merely a guess; I'm open to better suggestions. >> >> Noticed with gcc15, so may want backporting. >> >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -10667,7 +10667,7 @@ and @samp{[[omp::sequence(...)]]} in C a >> GCC needs to be invoked with the @option{-fopenmp} option. >> This option also arranges for automatic linking of the OpenMP >> runtime library. >> -@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}. >> +@xref{Enabling OpenMP,,,libgomp,GNU Offloading and Multi Processing Runtime >> Library}. >> @xref{OpenMP and OpenACC Options}, for additional options useful with >> @option{-fopenmp}. >> @@ -10689,7 +10689,7 @@ To enable the processing of OpenACC dire >> in C and C++, GCC needs to be invoked with the @option{-fopenacc} option. >> This option also arranges for automatic linking of the OpenACC runtime >> library. >> -@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}. >> +@xref{Enabling OpenMP,,,libgomp,GNU Offloading and Multi Processing Runtime >> Library}. >> @xref{OpenMP and OpenACC Options}, for additional options useful with >> @option{-fopenacc}. > > Hmmm, good catch on this. I clearly need to build a libgomp 4.7 manual and > use that for reference instead of the "normal" link at > > https://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html > > which is for the most recent release. > > I think in both places the node name specified as the first argument should > be "Top" -- the reference is intended to be to the libgomp documentation as a > whole, not the specific section about the compilation options. I can do that, albeit libgomp.texi doesn't have a mere "Top" node. It's @node Top, Enabling OpenMP there. Followed later by @node Enabling OpenMP @chapter Enabling OpenMP Please clarify; my texinfo knowledge is pretty limited. Please also clarify if with the adjustments I ought to resend a v2, or whether this is kind of a pre-approval. > (And "Enabling OpenMP" is certainly the wrong reference for enabling >OpenACC.) Oh, that should have been "Enabling OpenACC", if at all (as per above). Jan
Re: [PATCH] libfortran: Add script to regenerate source files
Hi FX, the patch looks good to me. I only have x86_64, too, therefore I haven't tested it (again). There's a lot of repetition in the regenerate.sh file. I hope to see this "simplified" or rather DRY'ed (Don't repeat yourself - principle) in the future. So looks good to me. Ok for mainline. Regards, Andre On Mon, 9 Jun 2025 21:06:02 +0200 FX Coudert wrote: > Hi, > > This patch adds a new “regenerate.sh” script in libgfortran/, which is the > new mechanism to rebuild all source files in the generated/ folder. It > removes this from the Makefile (where it was misusing the maintainer mode, > which was not intended for that purpose). The new script is standalone, and > it should be run by developers whenever they modify a file in the m4/ > directory. It will not be triggered by the Makefile, and must be run > manually. It does not modify the timestamps or inodes of generated files, if > they are not be modified. > > The Makefile can later be simplified, but I don’t want to mix all the changes > at once, so here’s a first part. If there is agreement, I’ll work on > simplifying the Makefile next. > > Undergoing testing on x86_64-linux, OK to push? > FX > > -- Andre Vehreschild * Email: vehre ad gmx dot de
Re: [AutoFDO] Profile merging for clone test
OK, thanks! Honza
Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
On Sun, Jun 08, 2025 at 07:26:44AM +0100, Stafford Horne wrote: > On Sat, Jun 07, 2025 at 06:53:28PM +0300, Dimitar Dimitrov wrote: > > On Sat, Jun 07, 2025 at 11:38:46AM +0100, Stafford Horne wrote: > > > On Fri, Jun 06, 2025 at 09:54:53PM +0100, Stafford Horne wrote: > > > > On Fri, Jun 06, 2025 at 11:31:00PM +0300, Dimitar Dimitrov wrote: > > > > > On Fri, Jun 06, 2025 at 06:12:13PM +0100, Stafford Horne wrote: > > > > > > On Fri, Jun 06, 2025 at 04:41:21PM +0100, Stafford Horne wrote: > > > > > > > On Fri, Jun 06, 2025 at 04:20:10PM +0100, Richard Sandiford wrote: > > > > > > > > Stafford Horne writes: > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > This seems to be causing a build regression on the or1k port. > > > > > > > > > > > > > > > > > > I have not quite figured it out yet but I have bisected to > > > > > > > > > this commit. > > > > > > > > > > > > > > > > > > The failure is as below, this seems to be caused by the > > > > > > > > > cstoresi4 instruction produced by > > > > > > > > > or1k.md. So I think its likely something we are doing funny > > > > > > > > > in the port. > > > > > > > > > > > > > > > > > > Thanks to Jeff for pointing this out. If you have any idea > > > > > > > > > let me know. > > > > > > > > > > > > > > > > > > Log: > > > > > > > > > > > > > > > > > > during RTL pass: ce1 > > > > > > > > > dump file: libgcc2.c.286r.ce1 > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c: In > > > > > > > > > function '__muldi3': > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c:538:1: > > > > > > > > > internal compiler error: in emit_move_multi_word, at > > > > > > > > > expr.cc:4492 > > > > > > > > > 538 | } > > > > > > > > > | ^ > > > > > > > > > 0x1b094df internal_error(char const*, ...) > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic-global-context.cc:517 > > > > > > > > > 0x6459c1 fancy_abort(char const*, int, char const*) > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic.cc:1815 > > > > > > > > > 0x473a2e emit_move_multi_word > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4492 > > > > > > > > > 0x93fd7d emit_move_insn(rtx_def*, rtx_def*) > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4746 > > > > > > > > > 0x9175f1 copy_to_reg(rtx_def*) > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/explow.cc:630 > > > > > > > > > 0xd18977 gen_lowpart_general(machine_mode, rtx_def*) > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/rtlhooks.cc:56 > > > > > > > > > 0x9414c7 convert_mode_scalar > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:818 > > > > > > > > > 0x9421a1 convert_modes(machine_mode, machine_mode, > > > > > > > > > rtx_def*, int) > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:961 > > > > > > > > > 0xc0afa0 prepare_operand(insn_code, rtx_def*, int, > > > > > > > > > machine_mode, machine_mode, int) > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4637 > > > > > > > > > 0xc0b33a prepare_cmp_insn > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4538 > > > > > > > > > 0xc0f0dd emit_conditional_move(rtx_def*, rtx_comparison, > > > > > > > > > rtx_def*, rtx_def*, machine_mode, int) > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:5098 > > > > > > > > > 0x192cc7b noce_emit_cmove > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:1777 > > > > > > > > > 0x193365b try_emit_cmove_seq > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3410 > > > > > > > > > 0x193365b noce_convert_multiple_sets_1 > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3705 > > > > > > > > > 0x1933c71 noce_convert_multiple_sets > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3496 > > > > > > > > > 0x1938687 noce_process_if_block > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4045 > > > > > > > > > 0x1938687 noce_find_if_block > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4726 > > > > > > > > > 0x1938687 find_if_header > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4931 > > > > > > > > > 0x1938687 if_convert > > > > > > > > > > > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6078 > > > > > > > > > 0x1938d01 rest_of_handle_if_conversion > > > > > > > > >
Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
Stafford Horne writes: > On Fri, Jun 06, 2025 at 04:41:21PM +0100, Stafford Horne wrote: >> On Fri, Jun 06, 2025 at 04:20:10PM +0100, Richard Sandiford wrote: >> > Stafford Horne writes: >> > > Hello, >> > > >> > > This seems to be causing a build regression on the or1k port. >> > > >> > > I have not quite figured it out yet but I have bisected to this commit. >> > > >> > > The failure is as below, this seems to be caused by the cstoresi4 >> > > instruction produced by >> > > or1k.md. So I think its likely something we are doing funny in the port. >> > > >> > > Thanks to Jeff for pointing this out. If you have any idea let me know. >> > > >> > > Log: >> > > >> > > during RTL pass: ce1 >> > > dump file: libgcc2.c.286r.ce1 >> > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c: In function >> > > '__muldi3': >> > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c:538:1: internal >> > > compiler error: in emit_move_multi_word, at expr.cc:4492 >> > > 538 | } >> > >| ^ >> > > 0x1b094df internal_error(char const*, ...) >> > > >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic-global-context.cc:517 >> > > 0x6459c1 fancy_abort(char const*, int, char const*) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic.cc:1815 >> > > 0x473a2e emit_move_multi_word >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4492 >> > > 0x93fd7d emit_move_insn(rtx_def*, rtx_def*) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4746 >> > > 0x9175f1 copy_to_reg(rtx_def*) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/explow.cc:630 >> > > 0xd18977 gen_lowpart_general(machine_mode, rtx_def*) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/rtlhooks.cc:56 >> > > 0x9414c7 convert_mode_scalar >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:818 >> > > 0x9421a1 convert_modes(machine_mode, machine_mode, rtx_def*, int) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:961 >> > > 0xc0afa0 prepare_operand(insn_code, rtx_def*, int, machine_mode, >> > > machine_mode, int) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4637 >> > > 0xc0b33a prepare_cmp_insn >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4538 >> > > 0xc0f0dd emit_conditional_move(rtx_def*, rtx_comparison, rtx_def*, >> > > rtx_def*, machine_mode, int) >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:5098 >> > > 0x192cc7b noce_emit_cmove >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:1777 >> > > 0x193365b try_emit_cmove_seq >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3410 >> > > 0x193365b noce_convert_multiple_sets_1 >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3705 >> > > 0x1933c71 noce_convert_multiple_sets >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3496 >> > > 0x1938687 noce_process_if_block >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4045 >> > > 0x1938687 noce_find_if_block >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4726 >> > > 0x1938687 find_if_header >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4931 >> > > 0x1938687 if_convert >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6078 >> > > 0x1938d01 rest_of_handle_if_conversion >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6143 >> > > >> > > -Stafford >> > >> > Can you attach the .i? >> >> Sure please find attached, I can reproduce with the or1k compiler using: >> >> $HOME/work/gnu-toolchain/build-gcc/./gcc/cc1 -O2 __muldi3.i >> >> I am still debugging, but I notice when reverting the patch the compiler >> never >> goes to emit_move_multi_word(). I am trying to work backwards as to where >> it goes >> different, it's just taking a bit of time as I haven't debugged GCC ICE's >> for a >> while. > > I made some progress on this. In openRISC we generate cstoresi4 with special > register SR_F (condition flag) bit register. > > This is represented as (reg:BI 34 ?sr_f). > > The RTL produced for cstoresi4 becomes something like: > > // The compare > (insn 49 48 50 (set (reg:BI 34 ?sr_f) > (leu:BI (reg/v:SI 68 [ __x2D.6783 ]) > (reg/v:SI 69 [ __x1D.6782 ]))) > "/home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c":532:22 -1 > (nil) > // The conditional branch > (jump_insn 50 49 0 (set (pc) >(if_then_else (ne (reg:BI 34 ?sr_f) >(const_int 0 [0])) >(label_ref 0) >(pc))) > "/home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c":532:22 -1 > (int_list:REG_BR_PROB 536870916 (nil))) > > During ce1 the try_emit_cmove_seq would try to convert (reg:BI 34 ?sr_f) to > SI. > Before this patch it was done in gen_lowpart_general by > simplify_context::simplify_gen_subreg. > > The simplity to subreg operation would convert: > (reg:BI 34 ?sr_
Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern [combine question]
Hi! On Tue, Apr 01, 2025 at 09:35:53PM -0600, Jeff Law wrote: > Segher -- there's a combine question near the end... > So this is a nasty little problem and touches on a deeper issue. > > The core problem is that combine doesn't know anything about > constraints. It works with predicates and conditions.So combine has > no idea if it's creating an ASM with operands that can't be handled. As > a result combine has to be careful with ASMs. If whatever combine produces passes recog(), and the insn_cost for it is less than what it started with (or the same cost is fine as well in many cases), it will allow it. The result is semantically equivalent, _by construction_. The only thing combine (or the target backend) has to be worried about is if this is a transformation that is beneficial; it always is correct. > In can_combine_p we have: > > > /* Can't merge an ASM_OPERANDS. */ > > || GET_CODE (src) == ASM_OPERANDS > > Which is part of a large conditional indicating when we can't combine > two insns together. Yes, there is no way to validly create a new extended asm (what would the template be, to start with!), and there should not be asm_operand's anywhere else. > So I would have expected it to reject this insn for > combining: > > >(insn 12 11 0 2 (set (mem/f:DI (reg/v/f:DI 138 [ e ]) [2 *e_6+0 S8 A64]) > >(asm_operands:DI ("") ("=A") 0 [ > >(mem/f:DI (reg/v/f:DI 138 [ e ]) [2 *e_6+0 S8 A64]) > >] > > [ > >(asm_input:DI ("A") j.c:8) > >] > > [] j.c:8)) "j.c":8:3 -1 > > (expr_list:REG_DEAD (reg/v/f:DI 138 [ e ]) > >(nil))) > > So the natural question is why did insn 12 participate in combination at > all: > > >Trying 11 -> 12: > > 11: r138:DI=r145:DI+r144:DI > > REG_DEAD r145:DI > > REG_DEAD r144:DI > > 12: [r138:DI]=asm_operands > > REG_DEAD r138:DI > >Failed to match this instruction: > >(set (mem/f:DI (plus:DI (reg/f:DI 145 [ b ]) > >(reg:DI 144 [ _4 ])) [2 *e_6+0 S8 A64]) > >(asm_operands:DI ("") ("=A") 0 [ > >(mem/f:DI (plus:DI (reg/f:DI 145 [ b ]) > >(reg:DI 144 [ _4 ])) [2 *e_6+0 S8 A64]) > >] > > [ > >(asm_input:DI ("A") j.c:8) > >] > > [] j.c:8)) > >allowing combination of insns 11 and 12 > >original costs 4 + 36 = 40 > >replacement cost 36 > >deferring deletion of insn with uid = 11. > >modifying insn i312: [r145:DI+r144:DI]=asm_operands > > REG_DEAD r144:DI > > REG_DEAD r145:DI > >deferring rescan insn with uid = 12. > So without a deep dive, my question is should this have been rejected > for combination before we even got here? I just don't see how combine > can do anything with asms other than replacing one pseudo with another > since combine doesn't have any notion of constraints. combine already rejects extended asm (as well as basic asm) in many places, I wonder how this got through. Was stuff in the instruction stream already malformed, perhaps? But, also, what is wrong here? combine is not combining anything in the asm, just in the argument of an asm operand, and that looks perfectly fine? Segher
Re: [PATCH] c++: Don't incorrectly reject override after class head name [PR120569]
On 6/6/25 12:35 PM, Jakub Jelinek wrote: Hi! While the https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2786r13.html#c03-compatibility-changes-for-annex-c-diff.cpp03.dcl.dcl hunk dropped because struct C {}; struct C {} final {}; is actually not valid C++98 (which didn't have list initialization), we actually also reject struct D {}; struct D {} override {}; and that IMHO is valid all the way from C++11 onwards. I don't see how it could be, and other compilers agree with me. Since the testcases don't put {} between the class name and the contextual keyword, I think this is just a typo in the comment. Especially in the light of P2786R13 adding new contextual keywords, I think it is better to use a separate routine for parsing the class-virt-specifier-seq (in C++11, there was export next to final), class-virt-specifier (in C++14 to C++23) and class-property-specifier-seq (in C++26) instead of using the same function for virt-specifier-seq and class-property-specifier-seq. Tested on x86_64-linux and i686-linux, ok for trunk? 2025-06-06 Jakub Jelinek PR c++/120569 * parser.cc (cp_parser_class_virt_specifier_seq_opt): New function. Let's refer to the new non-terminal rather than the old one, both in the name and comment of this function. (cp_parser_class_head): Use it instead of cp_parser_virt_specifier_seq_opt. Don't diagnose VIRT_SPEC_OVERRIDE here. Formatting fix. * g++.dg/cpp0x/override2.C: Expect different diagnostics with override or duplicate final. * g++.dg/cpp0x/override5.C: New test. * g++.dg/cpp0x/duplicate1.C: Expect different diagnostics with duplicate final. --- gcc/cp/parser.cc.jj 2025-06-02 11:00:14.0 +0200 +++ gcc/cp/parser.cc2025-06-06 17:20:16.344079071 +0200 @@ -28005,6 +28005,57 @@ cp_parser_class_specifier (cp_parser* pa return type; } +/* Parse an (optional) class-virt-specifier-seq. + + class-virt-specifier-seq: + class-virt-specifier class-virt-specifier-seq [opt] + + class-virt-specifier: + final + + Returns a bitmask representing the class-virt-specifiers. */ + +static cp_virt_specifiers +cp_parser_class_virt_specifier_seq_opt (cp_parser *parser) +{ + cp_virt_specifiers virt_specifiers = VIRT_SPEC_UNSPECIFIED; + + while (true) +{ + cp_token *token; + cp_virt_specifiers virt_specifier; + + /* Peek at the next token. */ + token = cp_lexer_peek_token (parser->lexer); + /* See if it's a class-virt-specifier. */ + if (token->type != CPP_NAME) + break; + if (id_equal (token->u.value, "final")) + { + maybe_warn_cpp0x (CPP0X_OVERRIDE_CONTROLS); + virt_specifier = VIRT_SPEC_FINAL; + } + else if (id_equal (token->u.value, "__final")) + virt_specifier = VIRT_SPEC_FINAL; + else + break; + + if (virt_specifiers & virt_specifier) + { + gcc_rich_location richloc (token->location); + richloc.add_fixit_remove (); + error_at (&richloc, "duplicate class-virt-specifier"); And here let's name the actual duplicate specifier rather refer to the grammar. + cp_lexer_purge_token (parser->lexer); + } + else + { + cp_lexer_consume_token (parser->lexer); + virt_specifiers |= virt_specifier; + } +} + return virt_specifiers; +} + /* Parse a class-head. class-head: @@ -28195,12 +28246,10 @@ cp_parser_class_head (cp_parser* parser, pop_deferring_access_checks (); if (id) -{ - cp_parser_check_for_invalid_template_id (parser, id, - class_key, - type_start_token->location); -} - virt_specifiers = cp_parser_virt_specifier_seq_opt (parser); +cp_parser_check_for_invalid_template_id (parser, id, +class_key, +type_start_token->location); + virt_specifiers = cp_parser_class_virt_specifier_seq_opt (parser); /* If it's not a `:' or a `{' then we can't really be looking at a class-head, since a class-head only appears as part of a @@ -28216,13 +28265,6 @@ cp_parser_class_head (cp_parser* parser, /* At this point, we're going ahead with the class-specifier, even if some other problem occurs. */ cp_parser_commit_to_tentative_parse (parser); - if (virt_specifiers & VIRT_SPEC_OVERRIDE) -{ - cp_parser_error (parser, - "cannot specify % for a class"); - type = error_mark_node; - goto out; -} /* Issue the error about the overly-qualified name now. */ if (qualified_p) { --- gcc/testsuite/g++.dg/cpp0x/override2.C.jj 2020-01-12 11:54:37.089403210 +0100 +++ gcc/testsuite/g++.dg/cpp0x/override2.C 2025-06-06 17:13:18.239613567 +0200 @@ -23,9 +23,9 @@ struct D5 : B3 {}; struct D6 : B
Re: [PATCH v3] c++: Fix template class lookup [PR120495, PR115605].
On 6/3/25 12:29 PM, Iain Sandoe wrote: Hi Jason, a bootstrap + testsuite (but without Ada or D). Hmm, it looks like make_typename_type wants to call lookup_template_class with a class as CONTEXT. But it first does the lookup separately. So I read that to indicate this does not need addressing (at this point) - we could extend it to allow class contexts. So I guess it makes sense to say that passing an identifier and context is only supported if context is a namespace. Done. + If D1 is an identifier and CONTEXT is non-NULL, then the lookup is + carried out in the CONTEXT namespace, otherwise CONTEXT is ignored and + lookup is carried out with default priority. This "otherwise" blurs two different cases: D1 is an id and CONTEXT is NULL (in which case lookup selects the innermost non-namespace binding), or D1 is not an id (in which case CONTEXT is ignored and no lookup is performed). Fixed Let's also remove the FUNCTION_CONTEXT paragraph. Done I guess we should stop passing the CONTEXT argument in non-id+ns cases such as make_typename_type and tsubst... I have not tried to address caller-side changes. + templ = maybe_get_template_decl_from_type_decl (templ); This line is unnecessary for namespace lookup, it's just for the injected-class-name. Thanks, fixed. re-tested on x86_64-darwin, powerpc64le is still running, how does it look now? OK. thanks Iain --- 8< --- In the reported issues, using lookup_template_class () directly on (for example) the coroutine_handle identifier fails because a class-local TYPE_DECL is found. This is because, in the existing code, lookup is called with default parameters which means that class contexts are examined first. Fix this, when a context is provided by the caller, by doing lookup in namespace provided. PR c++/120495 PR c++/115605 gcc/cp/ChangeLog: * pt.cc (lookup_template_class): Honour provided namespace contexts when looking up class templates. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr120495.C: New test. * g++.dg/pr115605.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/pt.cc | 30 +++- gcc/testsuite/g++.dg/coroutines/pr120495.C | 55 ++ gcc/testsuite/g++.dg/pr115605.C| 10 3 files changed, 84 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr120495.C create mode 100644 gcc/testsuite/g++.dg/pr115605.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index c5a3abe6d8b..3c8fa9a6728 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -10052,15 +10052,20 @@ tsubst_entering_scope (tree t, tree args, tsubst_flags_t complain, tree in_decl) D1 is the PTYPENAME terminal, and ARGLIST is the list of arguments. + If D1 is an identifier and CONTEXT is non-NULL, then the lookup is + carried out in CONTEXT. Currently, only namespaces are supported for + CONTEXT. + + If D1 is an identifier and CONTEXT is NULL, the lookup is performed + in the innermost non-namespace binding. + + Otherwise CONTEXT is ignored and no lookup is carried out. + IN_DECL, if non-NULL, is the template declaration we are trying to instantiate. Issue error and warning messages under control of COMPLAIN. - If the template class is really a local class in a template - function, then the FUNCTION_CONTEXT is the function in which it is - being instantiated. - ??? Note that this function is currently called *twice* for each template-id: the first time from the parser, while creating the incomplete type (finish_template_type), and the second type during the @@ -10079,20 +10084,23 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context, spec_entry **slot; spec_entry *entry; - if (identifier_p (d1)) + if (identifier_p (d1) && context) +{ + gcc_checking_assert (TREE_CODE (context) == NAMESPACE_DECL); + push_decl_namespace (context); + templ = lookup_name (d1, LOOK_where::NAMESPACE, LOOK_want::NORMAL); + pop_decl_namespace (); +} + else if (identifier_p (d1)) { tree value = innermost_non_namespace_value (d1); if (value && DECL_TEMPLATE_TEMPLATE_PARM_P (value)) templ = value; else - { - if (context) - push_decl_namespace (context); +{ templ = lookup_name (d1); templ = maybe_get_template_decl_from_type_decl (templ); - if (context) - pop_decl_namespace (); - } +} } else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1))) { diff --git a/gcc/testsuite/g++.dg/coroutines/pr120495.C b/gcc/testsuite/g++.dg/coroutines/pr120495.C new file mode 100644 index 000..f59c34a8676 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/pr120495.C @@ -0,0 +1,55 @@ +// { dg-additional-options "-fsyntax-only" } + +#include +#i
Re: [PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions
On Mon, Jun 09, 2025 at 02:48:58PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > On Thu, Jun 05, 2025 at 09:24:27PM +0100, Richard Sandiford wrote: > >> Spencer Abson writes: > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c > >> > b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c > >> > new file mode 100644 > >> > index 000..8f69232f2cf > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c > >> > @@ -0,0 +1,47 @@ > >> > +/* { dg-do compile } */ > >> > +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=2048 > >> > -fno-trapping-math" } */ > >> > >> The =2048 is ok, but do you need it for these autovectorisation tests? > >> If vectorisation is treated as not profitable without it, then perhaps > >> we could switch to Tamar's -mmax-vectorization, once that's in. > > > > This isn't needed to make vectorization profitable, but rather to > > make partial vector modes the reliably obvious choice - and hopefully > > one that is isn't affected by future cost model changes. With =2048 > > and COUNT, each loop should be fully-unrolled into a single unpacked > > operation (plus setup and return). > > > > For me, this was much more flexible than using builtin vector types, > > and easier to reason about. Maybe that's just me though! I can try > > something else if it would be preferred. > > I don't really agree about the "easier to reason about" bit: IMO, > builtin vector types are the most direct and obvious way of testing > things with fixed-length vectors, for the cases that they can handle > directly. But I agree that vectorisation is more flexible, in that > it can deal with cases that fixed-length builtin vectors can't yet > handle directly. > > My main concern was that the tests didn't seem to have much coverage > of normal VLA codegen. If the aim is predictable costing, it might > be enough to use -moverride=sve_width=2048 instead of > -msve-vector-bits=2048. I see - yeah, -moverride=sve_width=2048 is enough. How about we use builtin vectors wherever possible, and fall back to the current approach (but replacing -msve-vector-bits with -moverride=sve_width) everywhere else? Alternatively, if we'd like to focus on VLA codegen, I could just replace -msve-vector-bits with -moverride=sve_width throughout the series. Thanks, Spencer > > Thanks, > Richard
Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern [combine question]
On Mon, Jun 09, 2025 at 08:26:10AM -0600, Jeff Law wrote: > On 4/1/25 9:35 PM, Jeff Law wrote: > So returning to this > > So the combine pass doesn't reject combination into an ASM_INPUT, just > combination from most ASM_INPUTs. Yeah, exactly. > It does rely on predicates to determine validity. More precisely, recog() does. > So my suspicion is > that the memory_operand (or one of the closely related) predicates is > returning true for an addressing mode is that is not always valid. Yeah, some target issue. > Or to put it another way, addressing mode validity is based on two > properties. The form of the address and the mode of the memory > reference. If any other context is needed to determine validity, then > that addressing format must not be considered valid for the given memory > access mode. > > I hope I didn't steer you the wrong way on that issue Christoph! Yeah, sounds like a bug in TARGET_LEGITIMATE_ADDRESS_P . > So at best this splitter is still just papering over a bigger problem > that needs to be fixed. Essentially all splitters that are made just for combine paper over something (not define_insn_and_split, I mean pure splitters here; sometimes you want to allow combine to create something complex). Segher
Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
On 6/9/25 9:43 AM, Stafford Horne wrote: Hi, I do not have a fix for this yet. I feel using or1k_hard_regno_mode_ok to control allowing paradoxical subreging of openriscs sr_f special register is not right. It seems we would want to have something like or1k_regno_paradoxical_subreg_mode_ok. OpenRISC seems pretty unique in how we use (reg:BI 34 ?sr_f) for representing the condition register. The SR[F] bit is a flag bit in the supervisor register that represents if the last comparison was true (flag set) or false (flag unset). Not sure it's that radically different than various other targets. It's just a bit in a register. The question becomes how much to expose to GCC and how to expose it. Reminds me a bit of the "T" register on the sh port. Jeff
[pushed: r16-1348] diagnostics: convert enum logical_location_kind to enum class
No functional change intended. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r16-1348-gbf0c6e1d34fd9c. gcc/ChangeLog: * diagnostic-format-sarif.cc (maybe_get_sarif_kind): Update for conversion of enum logical_location_kind to enum class. * diagnostic.cc (logical_location_manager::function_p): Likewise. * libgdiagnostics.cc (html-output/missing-semicolon.py::get_kind): Likewise. * logical-location.h (enum logical_location_kind): Convert to... (enum class logical_location_kind): ...this. * selftest-logical-location.cc (test_logical_location_manager::item_from_funcname): Update for conversion of enum logical_location_kind to enum class. * tree-logical-location.cc (tree_logical_location_manager::get_kind): Likewise. Signed-off-by: David Malcolm --- gcc/diagnostic-format-sarif.cc | 40 +++--- gcc/diagnostic.cc| 18 +++--- gcc/libgdiagnostics.cc | 38 ++--- gcc/logical-location.h | 42 gcc/selftest-logical-location.cc | 2 +- gcc/tree-logical-location.cc | 12 - 6 files changed, 76 insertions(+), 76 deletions(-) diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index 7dcd126aeb23..853a8333b3cc 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -2735,51 +2735,51 @@ maybe_get_sarif_kind (enum logical_location_kind kind) { default: gcc_unreachable (); -case LOGICAL_LOCATION_KIND_UNKNOWN: +case logical_location_kind::unknown: return nullptr; /* Kinds within executable code. */ -case LOGICAL_LOCATION_KIND_FUNCTION: +case logical_location_kind::function: return "function"; -case LOGICAL_LOCATION_KIND_MEMBER: +case logical_location_kind::member: return "member"; -case LOGICAL_LOCATION_KIND_MODULE: +case logical_location_kind::module_: return "module"; -case LOGICAL_LOCATION_KIND_NAMESPACE: +case logical_location_kind::namespace_: return "namespace"; -case LOGICAL_LOCATION_KIND_TYPE: +case logical_location_kind::type: return "type"; -case LOGICAL_LOCATION_KIND_RETURN_TYPE: +case logical_location_kind::return_type: return "returnType"; -case LOGICAL_LOCATION_KIND_PARAMETER: +case logical_location_kind::parameter: return "parameter"; -case LOGICAL_LOCATION_KIND_VARIABLE: +case logical_location_kind::variable: return "variable"; /* Kinds within XML or HTML documents. */ -case LOGICAL_LOCATION_KIND_ELEMENT: +case logical_location_kind::element: return "element"; -case LOGICAL_LOCATION_KIND_ATTRIBUTE: +case logical_location_kind::attribute: return "attribute"; -case LOGICAL_LOCATION_KIND_TEXT: +case logical_location_kind::text: return "text"; -case LOGICAL_LOCATION_KIND_COMMENT: +case logical_location_kind::comment: return "comment"; -case LOGICAL_LOCATION_KIND_PROCESSING_INSTRUCTION: +case logical_location_kind::processing_instruction: return "processingInstruction"; -case LOGICAL_LOCATION_KIND_DTD: +case logical_location_kind::dtd: return "dtd"; -case LOGICAL_LOCATION_KIND_DECLARATION: +case logical_location_kind::declaration: return "declaration"; /* Kinds within JSON documents. */ -case LOGICAL_LOCATION_KIND_OBJECT: +case logical_location_kind::object: return "object"; -case LOGICAL_LOCATION_KIND_ARRAY: +case logical_location_kind::array: return "array"; -case LOGICAL_LOCATION_KIND_PROPERTY: +case logical_location_kind::property: return "property"; -case LOGICAL_LOCATION_KIND_VALUE: +case logical_location_kind::value: return "value"; } } diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc index 5b8eb7d14042..ab52e34e2fb5 100644 --- a/gcc/diagnostic.cc +++ b/gcc/diagnostic.cc @@ -1059,17 +1059,17 @@ logical_location_manager::function_p (key k) const { default: gcc_unreachable (); -case LOGICAL_LOCATION_KIND_UNKNOWN: -case LOGICAL_LOCATION_KIND_MODULE: -case LOGICAL_LOCATION_KIND_NAMESPACE: -case LOGICAL_LOCATION_KIND_TYPE: -case LOGICAL_LOCATION_KIND_RETURN_TYPE: -case LOGICAL_LOCATION_KIND_PARAMETER: -case LOGICAL_LOCATION_KIND_VARIABLE: +case logical_location_kind::unknown: +case logical_location_kind::module_: +case logical_location_kind::namespace_: +case logical_location_kind::type: +case logical_location_kind::return_type: +case logical_location_kind::parameter: +case logical_location_kind::variable: return false; -case LOGICAL_LOCATION_KIND_FUNCTION: -case LOGICAL_LOCATION_KIND_MEMBER: +case logical_location_kind::function: +case logical_location_k
[PATCH] or1k: Fix ICE in libgcc caused by recent validate_subreg changes
After commit eb2ea476db2 ("emit-rtl: Allow extra checks for paradoxical subregs [PR119966]") paradoxical subregs or the OpenRISC condition flag register (reg:BI sr_f) are no longer allowed. This causes and ICE in the ce1 pass which tries to get the or1k flag register into an SI register, which is no longer possible. Adjust or1k_can_change_mode_class to allow changing the or1k flag reg to SI mode which in turn allows paradoxical subregs to bre generated again. gcc/ChangeLog: PR or1k/120587 * config/or1k/or1k.cc (or1k_can_change_mode_class): Allow changing flags mode from BI to SI to allow for paradoxical subregs. --- Sending again to correct mailing list. gcc/config/or1k/or1k.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/or1k/or1k.cc b/gcc/config/or1k/or1k.cc index 62e2168e0ee..f1c92c6bf6c 100644 --- a/gcc/config/or1k/or1k.cc +++ b/gcc/config/or1k/or1k.cc @@ -1408,8 +1408,9 @@ static bool or1k_can_change_mode_class (machine_mode from, machine_mode to, reg_class_t rclass) { + /* Allow cnoverting special flags to SI mode subregs. */ if (rclass == FLAG_REGS) -return from == to; +return from == to || (from == BImode && to == SImode); return true; } -- 2.49.0
[PATCH] libfortran: Add script to regenerate source files
Hi, This patch adds a new “regenerate.sh” script in libgfortran/, which is the new mechanism to rebuild all source files in the generated/ folder. It removes this from the Makefile (where it was misusing the maintainer mode, which was not intended for that purpose). The new script is standalone, and it should be run by developers whenever they modify a file in the m4/ directory. It will not be triggered by the Makefile, and must be run manually. It does not modify the timestamps or inodes of generated files, if they are not be modified. The Makefile can later be simplified, but I don’t want to mix all the changes at once, so here’s a first part. If there is agreement, I’ll work on simplifying the Makefile next. Undergoing testing on x86_64-linux, OK to push? FX 0001-libfortran-Add-script-to-regenerate-source-files.patch Description: Binary data
[PATCH v2] c++, coroutines: Avoid UNKNOWN_LOCATION synthesizing code [PR120273].
Hi Jason, This replaces "c-lex: Handle NULL filenames from UNKNOWN_LOCATION" as we discussed off-list, you prefer a solution that has valid locations during the synthesis. I have reverted to using the function location for code that represents start-up and the closing brace for code that represents shut-down. Note that we do not have (and cannot easily get) the position of the opening brace. Tested on x86_64 darwin and powerpc64le linux, OK for trunk? thanks Iain --- 8< --- Some of the lookup code is expecting to find a valid (not UNKNOWN) location, which triggers in the reported case. To avoid this, we are reverting the change to use UNKNOWN_LOCATION for synthesizing the wrapper, and instead using the start and end locations of the original function. PR c++/120273 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::wrap_original_function_body): Use function start and end locations when synthesizing code. (cp_coroutine_transform::cp_coroutine_transform): Set the function end location. * coroutines.h: Add the function end location. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro-missing-final-suspend.C: Adjust for changed final suspend diagnostics line number change. * g++.dg/coroutines/coro1-missing-await-method.C: Likewise. * g++.dg/coroutines/pr104051.C: Likewise. * g++.dg/coroutines/pr120273.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 18 +++--- gcc/cp/coroutines.h | 1 + .../coroutines/coro-missing-final-suspend.C | 4 +- .../coroutines/coro1-missing-await-method.C | 2 +- gcc/testsuite/g++.dg/coroutines/pr104051.C| 4 +- gcc/testsuite/g++.dg/coroutines/pr120273.C| 58 +++ 6 files changed, 75 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr120273.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 97eee6e8ea4..d482f52fefa 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -4281,8 +4281,7 @@ cp_coroutine_transform::wrap_original_function_body () { /* Avoid the code here attaching a location that makes the debugger jump. */ iloc_sentinel stable_input_loc (fn_start); - location_t loc = UNKNOWN_LOCATION; - input_location = loc; + location_t loc = fn_start; /* This will be our new outer scope. */ tree update_body @@ -4513,18 +4512,22 @@ cp_coroutine_transform::wrap_original_function_body () add_stmt (return_void); } + /* We are now doing actions associated with the end of the function, so + point to the closing brace. */ + input_location = fn_end; + /* co_return branches to the final_suspend label, so declare that now. */ fs_label = create_named_label_with_ctx (loc, "final.suspend", NULL_TREE); - add_stmt (build_stmt (loc, LABEL_EXPR, fs_label)); + add_stmt (build_stmt (fn_end, LABEL_EXPR, fs_label)); /* Before entering the final suspend point, we signal that this point has been reached by setting the resume function pointer to zero (this is what the 'done()' builtin tests) as per the current ABI. */ - zero_resume = build2_loc (loc, MODIFY_EXPR, act_des_fn_ptr_type, + zero_resume = build2_loc (fn_end, MODIFY_EXPR, act_des_fn_ptr_type, resume_fn_ptr, zero_resume); finish_expr_stmt (zero_resume); - finish_expr_stmt (build_init_or_final_await (fn_start, true)); + finish_expr_stmt (build_init_or_final_await (fn_end, true)); BIND_EXPR_BODY (update_body) = pop_stmt_list (BIND_EXPR_BODY (update_body)); BIND_EXPR_VARS (update_body) = nreverse (var_list); BLOCK_VARS (top_block) = BIND_EXPR_VARS (update_body); @@ -5181,9 +5184,10 @@ cp_coroutine_transform::cp_coroutine_transform (tree _orig_fn, bool _inl) } /* We don't have the locus of the opening brace - it's filled in later (and - there doesn't really seem to be any easy way to get at it). - The closing brace is assumed to be input_location. */ + there doesn't really seem to be any easy way to get at it). */ fn_start = DECL_SOURCE_LOCATION (orig_fn_decl); +/* The closing brace is assumed to be input_location. */ +fn_end = input_location; /* Build types we need. */ tree fr_name = get_fn_local_identifier (orig_fn_decl, "Frame"); diff --git a/gcc/cp/coroutines.h b/gcc/cp/coroutines.h index 55caa6e61e3..cb5d5572733 100644 --- a/gcc/cp/coroutines.h +++ b/gcc/cp/coroutines.h @@ -102,6 +102,7 @@ private: tree orig_fn_decl;/* The original function decl. */ tree orig_fn_body = NULL_TREE; /* The original function body. */ location_t fn_start = UNKNOWN_LOCATION; + location_t fn_end = UNKNOWN_LOCATION; tree resumer = error_mark_node; tree destroyer = error_mark_node; tree coroutine_body = NULL_TREE; diff --git a/gcc/testsuite/g++.dg/coroutines/coro-missing-final-suspend.C b/gcc/te
[PATCH v2] c++, coroutines: Improve diagnostics for awaiter/promise.
Hi Jason, >>+ error_at (loc, "%sawaitable type %qT is not a structure", >>+ extra, o_type); >Generally identifiers should be incorporated with %qs, and relying on the %s >to provide a space doesn't seem very i8n-friendly. Better, I think, to handle >the case with no identifier as a separate diagnostic. Fixed. >It looks like there's no test for the initial_suspend case? Added one, retested on x86_64-darwin, powerpc64le linux, OK for trunk? thanks, Iain --- 8< --- At present, we can issue diagnostics about missing or malformed awaiter or promise methods when we encounter their uses in the body of a user's function. We might then re-issue the same diagnostics when processing the initial or final await expressions. This change avoids such duplication, and also attempts to identify issues with the initial or final expressions specifically since diagnostics for those do not have any useful line number. gcc/cp/ChangeLog: * coroutines.cc (build_co_await): Identify diagnostics for initial and final await expressions. (cp_coroutine_transform::wrap_original_function_body): Do not handle initial and final await expressions here ... (cp_coroutine_transform::apply_transforms): ... handle them here and avoid duplicate diagnostics. * coroutines.h: Declare inital and final await expressions in the transform class. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro1-missing-await-method.C: Adjust for improved diagnostics. * g++.dg/coroutines/pr104051.C: Move to... * g++.dg/coroutines/pr104051-0.C: ...here. * g++.dg/coroutines/pr104051-1.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 24 +++ gcc/cp/coroutines.h | 3 +++ .../coroutines/coro1-missing-await-method.C | 4 ++-- .../coroutines/{pr104051.C => pr104051-0.C} | 2 +- gcc/testsuite/g++.dg/coroutines/pr104051-1.C | 23 ++ 5 files changed, 49 insertions(+), 7 deletions(-) rename gcc/testsuite/g++.dg/coroutines/{pr104051.C => pr104051-0.C} (92%) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr104051-1.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index d482f52fefa..18c0a4812c4 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1277,8 +1277,14 @@ build_co_await (location_t loc, tree a, suspend_point_kind suspend_kind, if (TREE_CODE (o_type) != RECORD_TYPE) { - error_at (loc, "awaitable type %qT is not a structure", - o_type); + if (suspend_kind == FINAL_SUSPEND_POINT) + error_at (loc, "%qs awaitable type %qT is not a structure", + "final_suspend()", o_type); + else if (suspend_kind == INITIAL_SUSPEND_POINT) + error_at (loc, "%qs awaitable type %qT is not a structure", + "initial_suspend()", o_type); + else + error_at (loc, "awaitable type %qT is not a structure", o_type); return error_mark_node; } @@ -4329,7 +4335,6 @@ cp_coroutine_transform::wrap_original_function_body () /* Wrap the function body in a try {} catch (...) {} block, if exceptions are enabled. */ tree var_list = NULL_TREE; - tree initial_await = build_init_or_final_await (fn_start, false); /* [stmt.return.coroutine] / 3 If p.return_void() is a valid expression, flowing off the end of a @@ -4527,7 +4532,8 @@ cp_coroutine_transform::wrap_original_function_body () zero_resume = build2_loc (fn_end, MODIFY_EXPR, act_des_fn_ptr_type, resume_fn_ptr, zero_resume); finish_expr_stmt (zero_resume); - finish_expr_stmt (build_init_or_final_await (fn_end, true)); + finish_expr_stmt (final_await); + BIND_EXPR_BODY (update_body) = pop_stmt_list (BIND_EXPR_BODY (update_body)); BIND_EXPR_VARS (update_body) = nreverse (var_list); BLOCK_VARS (top_block) = BIND_EXPR_VARS (update_body); @@ -5266,6 +5272,16 @@ cp_coroutine_transform::apply_transforms () = coro_build_actor_or_destroy_function (orig_fn_decl, act_des_fn_type, frame_ptr_type, false); + /* Avoid repeating diagnostics about promise or awaiter fails. */ + if (!seen_error ()) +{ + iloc_sentinel stable_input_loc (fn_start); + initial_await = build_init_or_final_await (fn_start, false); + input_location = fn_end; + if (initial_await && initial_await != error_mark_node) + final_await = build_init_or_final_await (fn_end, true); +} + /* Transform the function body as per [dcl.fct.def.coroutine] / 5. */ wrap_original_function_body (); diff --git a/gcc/cp/coroutines.h b/gcc/cp/coroutines.h index cb5d5572733..7613e292c58 100644 --- a/gcc/cp/coroutines.h +++ b/gcc/cp/coroutines.h @@ -127,6 +127,9 @@ private: bool inline_p = false; bool valid_coroutine = false; + tree initial_await = error_mark_node; + tree final_await =
[wwwdocs] Add C status page (check, small tweaks)
I've checked our C99 status table against the list in Annex M.5 in C23 (n3220). I found no issues. This patch renames the title of our C status page in preparation for adding C11 and C23 lists. W3 validated. Ok? --- htdocs/projects/c-status.html | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/htdocs/projects/c-status.html b/htdocs/projects/c-status.html index 85adb158..ea5ebc4b 100644 --- a/htdocs/projects/c-status.html +++ b/htdocs/projects/c-status.html @@ -3,12 +3,27 @@ -Status of C99 features in GCC +C Standards Support in GCC https://gcc.gnu.org/gcc.css";> -Status of C99 features in GCC +C Standards Support in GCC + +GCC supports different dialects of C, corresponding to the multiple +published ISO standards. A particular standard can be selected using +the -std= command-line option. + + + C90 + C99 + C11 + C17 + C23 + C2y + + +C99 Support in GCC C99 is substantially completely supported as of GCC 4.5 (with -std=c99 -pedantic-errors used; @@ -371,6 +386,12 @@ GCC 3.0, even if there is no specific note regarding corner cases. +C90 Support in GCC + +GCC has support for ISO C90; it can be explicitly selected with the +-std=c90 command-line flag, or -std=gnu90 +to enable GNU extensions as well. + Further notes @@ -405,7 +426,7 @@ with each other and with string literals, but currently don't. The information provided by static in parameter array declarators is not used for optimization. It might make sense to use -it in future in conjunction with work +it in future in conjunction with work on prefetching. -- 2.49.0
[PATCH] c++, coroutines: Handle builtin_constant_p [PR116775].
Hi Jason, As discussed in the PR, fold the expression in the coroutine lowering. tested on x86_64-darwin, powerpc64le-linux, OK for trunk? thanks Iain --- 8< --- Since the folding of this builtin happens after the main coroutine FE lowering, we need to account for await expressions in that lowering. Since these expressions have a property of being not evaluated, but do not have the full constraints of an unevaluatated context, we want to apply the checks and then remove the await expressions so that they no longer participate in the analysis and lowering. When a builtin_constant_p call is encountered, and the operand contains any await expression, we check to see if the operand can be a constant and replace the call with its result. PR c++/116775 gcc/cp/ChangeLog: * coroutines.cc (analyze_expression_awaits): When we see a builtin_constant_p call, and that contains one or more await expressions, then replace the call with its result and discard the unevaluated operand. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr116775.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 23 +++- gcc/testsuite/g++.dg/coroutines/pr116775.C | 68 ++ 2 files changed, 90 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116775.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 4d6ec171cd5..bbdc33abc0e 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -3551,7 +3551,8 @@ maybe_promote_temps (tree *stmt, void *d) return cp_walk_tree (stmt, register_awaits, d, &visited); } -/* Lightweight callback to determine two key factors: +/* Relatively lightweight callback to do initial assessment: + 0) Rewrite some await expressions. 1) If the statement/expression contains any await expressions. 2) If the statement/expression potentially requires a re-write to handle TRUTH_{AND,OR}IF_EXPRs since, in most cases, they will need expansion @@ -3568,6 +3569,26 @@ analyze_expression_awaits (tree *stmt, int *do_subtree, void *d) switch (TREE_CODE (*stmt)) { default: return NULL_TREE; + case CALL_EXPR: + { + tree fn = cp_get_callee_fndecl_nofold (*stmt); + /* Special-cases where we want to re-write await expressions to some +other value before they are otherwise processed. */ + if (fn && DECL_IS_BUILTIN_CONSTANT_P (fn)) + { + gcc_checking_assert (call_expr_nargs (*stmt) == 1); + tree expr = CALL_EXPR_ARG (*stmt, 0); + if (cp_walk_tree (&expr, find_any_await, nullptr, NULL)) + { + if (maybe_constant_value (expr) == expr) + *stmt = integer_zero_node; + else + *stmt = integer_one_node; + } + *do_subtree = 0; + } + } + break; case CO_YIELD_EXPR: /* co_yield is syntactic sugar, re-write it to co_await. */ *stmt = TREE_OPERAND (*stmt, 1); diff --git a/gcc/testsuite/g++.dg/coroutines/pr116775.C b/gcc/testsuite/g++.dg/coroutines/pr116775.C new file mode 100644 index 000..981e27af27b --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/pr116775.C @@ -0,0 +1,68 @@ +// { dg-additional-options "-fsyntax-only" } +// PR116775 +#include +#ifndef OUTPUT +# define PRINT(X) +# define PRINTF(...) +#else +#include +# define PRINT(X) puts(X) +# define PRINTF(...) printf(__VA_ARGS__) +#endif + +struct awaitable { +awaitable(int n) : delay{n} {} + +constexpr bool await_ready() const noexcept { return false; } +auto await_suspend(std::coroutine_handle<> h) const { +__builtin_trap (); +return false; +} +int await_resume() const noexcept { +return delay; +} + +int delay; +}; + +struct Task { +struct promise_type { +promise_type() = default; +Task get_return_object() { return {}; } +std::suspend_never initial_suspend() { return {}; } +std::suspend_always final_suspend() noexcept { return {}; } +void unhandled_exception() {} +void return_void () {} +awaitable yield_value (int v) { return {v}; } +}; +}; + +Task foo() noexcept { +if (__builtin_constant_p (true ? 1 : co_await awaitable{10})) + PRINT ("const OK"); +else + { +PRINT ("failed : should be const"); +__builtin_abort (); + } +if (__builtin_constant_p (false ? 1 : co_await awaitable{15})) + { +PRINT ("failed : should not be const"); +__builtin_abort (); + } +else + PRINT ("not const, OK"); +if (__builtin_constant_p (5 == (co_yield 42))) + { +PRINT ("failed : should not be const"); +__builtin_abort (); + } +else + PRINT ("not const, OK"); +co_return; +} + +//call foo +int main() { +foo(); +} -- 2.39.2 (Apple
[PATCH v3] c++, coroutines: CWG2563 promise lifetime extension [PR115908].
Hi Jason, A complete re-implementation using a reference count as you suggested in response to discussions of remaining issues. I also discussed some of the points we encountered with one of the original coroutines authors; it is accepted that having two places to cleanup was probably a design flaw - but we are where we are. I added more cases to the existing tests for exception cleanups (95615) and moved the original 115908 into the torture tests since it is for code-gen. (this needs the can_throw fix or some of the 95615 cases will fail) Tested on x86_64-darwin, powerpc64le-linux. OK for trunk? thanks Iain --- 8< --- This implements the final piece of the revised CWG2563 wording; "It exits the scope of promise only if the coroutine completed without suspending." Considering the coroutine to be made up of two components; a 'ramp' and a 'body' where the body represents the user's original code and the ramp is responsible for setup of that and for returning some object to the original caller. Coroutine state, and responsibility for its release. A coroutine has some state that persists across suspensions. The state has two components: * State that is specified by the standard and persists for the entire life of the coroutine. * Local state that is constructed/destructed as scopes in the original function body are entered/exited. The destruction of local state is always the responsibility of the body code. The persistent state (and the overall storage for the state) must be managed in two places: * The ramp function (which allocates and builds this - and can, in some cases, be responsible for destroying it) * The re-written function body which can destroy it when that body completes its final suspend - or when the handle.destroy () is called. In all cases the ramp holds responsibility for constructing the standard- mandated persistent state. There are four ways in which the ramp might be re-entered after starting the function body: A The body could suspend (one might expect that to be the 'normal' case for most coroutine). B The body might complete either synchronously or via continuations. C An exception might be thrown during the setup of the initial await expression, before the initial awaiter resumes. D An exception might be processed by promise.unhandled_exception () and that, in turn, might re-throw it (or throw something else). In this case, the coroutine is considered suspended at the final suspension point. Once the coroutine has passed initial suspend (i.e. the initial awaiter await_resume() has been called) the body is considered to have a use of the state. Until the ramp return value has been constructed, the ramp is considered to have a use of the state. To manage these interacting conditions we allocate a reference counter for the frame state. This is initialised to 2 by the ramp as part of its startup (note that failures/exceptions in the startup code are handled locally to the ramp). When the body completes by passing the final suspend, the reference is decremented (the body no longer has a use). If the reference count is now zero, then the body continues to destruction of the state. Passing final suspend also corresponds to the handle.destroy () call. In cases A and B the return path to the ramp decrements the count (since the ramp return value can now be constructed and the ramp's use of the state is complete). In case C, when the exception arrives at the ramp, the reference count will be 2. We cannot expect to construct the ramp return object. Case D is an outlier, since the responsibility for destruction of the state now rests with the user's code (via a handle.destroy() call). In the case that the body has never suspended before such an exception occurs, the only reasonable way for the user code to obtain the necessary handle is if unhandled_exception() throws the handle or some object that contains the handle. That is outside of the designs here - if the user code might need this corner-case, then such provision will have to be made. Before the unhandled_exception() is called, we save the refcount and then set it to 1 (since the only hold over the frame in the case of a thrown exception is the body, the ramp return object cannot be constructed). The saved state is restored if unhandled_exception returns normally and we carry on to final_suspend. In the ramp, we implement destruction for the persistent frame state by means of cleanups. These are run conditionally when the reference count is either 2 (signalling an exception before the body has fully started) or 0 signalling that both the body and the ramp have completed (i.e. the body ran to completion before the ramp was re-entered). PR c++/115908 PR c++/118074 PR c++/95615 gcc/cp/ChangeLog: * coroutines.cc (coro_frame_refcount_id): New. (coro_init_identifiers): Initialise coro_frame_refcount_id. (bui
[PATCH] c++, coroutines: Handle unevaluated contexts
Tested on x86_64-darwin, powerpc64le-linux, OK for trunk? thanks Iain --- 8< -- >From [expr.await]/2 We should not accept co_await, co_yield in unevaluated contexts. It seems that we had not been marking typeid expressions as unevaluated so that is also added here. gcc/cp/ChangeLog: * coroutines.cc (finish_co_await_expr): Do not allow in an unevaluated context. (finish_co_yield_expr): Likewise. * parser.cc (cp_parser_postfix_expression): Mark typeid expressions as unevaluated. gcc/testsuite/ChangeLog: * g++.dg/coroutines/unevaluated.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 12 ++ gcc/cp/parser.cc | 2 ++ gcc/testsuite/g++.dg/coroutines/unevaluated.C | 24 +++ 3 files changed, 38 insertions(+) create mode 100644 gcc/testsuite/g++.dg/coroutines/unevaluated.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 32fd1c65bf7..4d6ec171cd5 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1549,6 +1549,12 @@ finish_co_await_expr (location_t kw, tree expr) if (!expr || error_operand_p (expr)) return error_mark_node; + if (cp_unevaluated_operand) +{ + error_at (kw, "%qs cannot be used in an unevaluated context","co_await"); + return error_mark_node; +} + if (!coro_common_keyword_context_valid_p (current_function_decl, kw, "co_await")) return error_mark_node; @@ -1629,6 +1635,12 @@ finish_co_yield_expr (location_t kw, tree expr) if (!expr || error_operand_p (expr)) return error_mark_node; + if (cp_unevaluated_operand) +{ + error_at (kw, "%qs cannot be used in an unevaluated context","co_yield"); + return error_mark_node; +} + /* Check the general requirements and simple syntax errors. */ if (!coro_common_keyword_context_valid_p (current_function_decl, kw, "co_yield")) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 86337635f48..15815f9e61b 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -7957,9 +7957,11 @@ cp_parser_postfix_expression (cp_parser *parser, bool address_p, bool cast_p, tree expression; /* Look for an expression. */ + ++cp_unevaluated_operand; expression = cp_parser_expression (parser, & idk); /* Compute its typeid. */ postfix_expression = build_typeid (expression, tf_warning_or_error); + --cp_unevaluated_operand; /* Look for the `)' token. */ close_paren = parens.require_close (parser); } diff --git a/gcc/testsuite/g++.dg/coroutines/unevaluated.C b/gcc/testsuite/g++.dg/coroutines/unevaluated.C new file mode 100644 index 000..f763b208cc9 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/unevaluated.C @@ -0,0 +1,24 @@ +// { dg-additional-options "-fsyntax-only" } +#include +#include + +struct Task { +struct promise_type { +promise_type() = default; +Task get_return_object() { return {}; } +std::suspend_never initial_suspend() { return {}; } +std::suspend_always final_suspend() noexcept { return {}; } +void unhandled_exception() {} +void return_void () {} +std::suspend_never yield_value (int) { return {}; } +}; +}; + +// We do not permit co_await, co_yield outside a function, and so uses in +// noexcept or requirements are covered by that. +Task foo() { +const std::type_info& ti1 = typeid (co_await std::suspend_never{}); // { dg-error {'co_await' cannot be used in an unevaluated context} } +std::size_t x = sizeof (co_yield (19)); // { dg-error {'co_yield' cannot be used in an unevaluated context} } +decltype (co_await std::suspend_never{}) A; // { dg-error {'co_await' cannot be used in an unevaluated context} } +co_return; +} -- 2.39.2 (Apple Git-143)
[PATCH] c++, coroutines: Handle await expressions in assume attributes.
Hi Jason, There was some discussion of this in the PR116775 comments. In the end I have matched what clang does in this circumstance, since that seems reasonable - and we may ignore the attributes as needed. tested on x86-64-darwin, powerpc64le-linux, OK for trunk? thanks Iain --- 8< --- Here we have an expression that is not evaluated but is still seen as potentially-evaluated. We handle this by determining if the operand has side-effects, producing a warning that the assume has been ignored and eliding it. gcc/cp/ChangeLog: * coroutines.cc (analyze_expression_awaits): Elide assume attributes containing await expressions, since these have side effects. Emit a diagnostic that this has been done. gcc/testsuite/ChangeLog: * g++.dg/coroutines/assume.C: New test. --- gcc/cp/coroutines.cc | 13 gcc/testsuite/g++.dg/coroutines/assume.C | 40 2 files changed, 53 insertions(+) create mode 100644 gcc/testsuite/g++.dg/coroutines/assume.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index bbdc33abc0e..21b1d85b02c 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -3587,6 +3587,19 @@ analyze_expression_awaits (tree *stmt, int *do_subtree, void *d) } *do_subtree = 0; } + else if (!fn && CALL_EXPR_IFN (*stmt) == IFN_ASSUME) + { + tree expr = CALL_EXPR_ARG (*stmt, 0); + if (TREE_SIDE_EFFECTS (expr)) + { + location_t loc_e = cp_expr_location (expr); + location_t loc_s = cp_expr_location (*stmt); + location_t loc_n = make_location (loc_e, loc_s, loc_s); + warning_at (loc_n, OPT_Wattributes,"assumption ignored" + " because it contains possible side-effects"); + *stmt = build_empty_stmt (loc_n); + } + } } break; case CO_YIELD_EXPR: diff --git a/gcc/testsuite/g++.dg/coroutines/assume.C b/gcc/testsuite/g++.dg/coroutines/assume.C new file mode 100644 index 000..ad979b4956e --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/assume.C @@ -0,0 +1,40 @@ +// { dg-additional-options "-fsyntax-only -Wattributes" } + +#include + +struct awaitable { +awaitable (int n) : delay{n} {} + +constexpr bool await_ready () const noexcept { return false; } +auto await_suspend (std::coroutine_handle<> h) const { +__builtin_abort (); +return false; +} +int await_resume() const noexcept { +return delay; +} + +int delay; +}; + +struct Task { +struct promise_type { +promise_type() = default; +Task get_return_object() { return {}; } +std::suspend_never initial_suspend() { return {}; } +std::suspend_always final_suspend() noexcept { return {}; } +void unhandled_exception() {} +void return_void () {} +awaitable yield_value (int v) { return {v}; } +}; +}; + +int h () { return 5; } + +Task foo() noexcept { +int x = 5; +[[assume (x == 5)]]; +[[assume (co_await awaitable{10})]]; // { dg-warning {assumption ignored because it contains possible side-effects} } +[[assume ((h(),co_await awaitable{11}))]]; // { dg-warning {assumption ignored because it contains possible side-effects} } +co_return; +} -- 2.39.2 (Apple Git-143)
Re: [PATCH] expand, ranger: Use ranger during expansion [PR120434]
On Sun, Jun 08, 2025 at 10:49:44AM +0200, Richard Biener wrote: > I'm also a bit nervous about this given during RTL expansion the IL is > neither fully GIMPLE nor fully RTL. Given we do not perform many > range queries we might be just lucky to not run into any issues? So, I've added following incremental patch: --- gcc/expr.cc.jj 2025-06-09 15:07:38.882265627 +0200 +++ gcc/expr.cc 2025-06-09 16:18:26.988947027 +0200 @@ -66,6 +66,7 @@ along with GCC; see the file COPYING3. #include "tree-pretty-print.h" #include "flags.h" #include "internal-fn.h" +#include "value-query.h" /* If this is nonzero, we do not bother generating VOLATILE @@ -11348,6 +11348,25 @@ expand_expr_real_1 (tree exp, rtx target LAST_VIRTUAL_REGISTER + 1); } + { + gimple *cg = currently_expanding_gimple_stmt; + if (INTEGRAL_TYPE_P (TREE_TYPE (exp))) + { + int_range_max vr; + get_range_query (cfun)->range_of_expr (vr, exp, cg); + } + else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (exp))) + { + frange vr; + get_range_query (cfun)->range_of_expr (vr, exp, cg); + } + else if (POINTER_TYPE_P (TREE_TYPE (exp))) + { + prange vr; + get_range_query (cfun)->range_of_expr (vr, exp, cg); + } + } + g = get_gimple_for_ssa_name (exp); /* For EXPAND_INITIALIZER try harder to get something simpler. */ if (g == NULL (in first attempt this was guarded with if (flag_checking), but that was a bad idea, because it then causes bootstrap comparison failure, one stage built with -fno-checking, another with -fchecking and the different range queries result in different answers from the other range queries). I don't think we want to guard that with #if CHECKING_P either though, if it can cause different code generation (I think there were about 15 .bad_compare files on x86_64 and none on i686), then we'd be testing for months something that we'll actually not release afterwards. And the only issue it found (in x86_64 and i686-linux bootstraps/regtests) is that the 4 remove_edge calls done during expansion will not remove phi arguments and so ranger can then ICE. With the following incremental patch I don't see other problems. --- gcc/cfgexpand.h.jj 2025-04-08 14:08:48.481320427 +0200 +++ gcc/cfgexpand.h 2025-06-09 15:01:52.946766978 +0200 @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. extern tree gimple_assign_rhs_to_tree (gimple *); extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *); +extern void expand_remove_edge (edge); extern void set_parm_rtl (tree, rtx); --- gcc/cfgexpand.cc.jj 2025-06-09 14:18:56.0 +0200 +++ gcc/cfgexpand.cc2025-06-09 15:01:36.423981975 +0200 @@ -2818,6 +2818,19 @@ label_rtx_for_bb (basic_block bb) } +/* Wrapper around remove_edge during expansion. */ + +void +expand_remove_edge (edge e) +{ + if (current_ir_type () != IR_GIMPLE + && (e->dest->flags & BB_RTL) == 0 + && !gimple_seq_empty_p (phi_nodes (e->dest))) +remove_phi_args (e); + remove_edge (e); +} + + /* A subroutine of expand_gimple_cond. Given E, a fallthrough edge of a basic block where we just expanded the conditional at the end, possibly clean up the CFG and instruction sequence. LAST is the @@ -2840,7 +2853,7 @@ maybe_cleanup_end_of_block (edge e, rtx_ if (BARRIER_P (get_last_insn ())) { rtx_insn *insn; - remove_edge (e); + expand_remove_edge (e); /* Now, we have a single successor block, if we have insns to insert on the remaining edge we potentially will insert it at the end of this block (if the dest block isn't feasible) @@ -4460,7 +4473,7 @@ expand_gimple_tailcall (basic_block bb, if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)) e->dest->count -= e->count (); probability += e->probability; - remove_edge (e); + expand_remove_edge (e); } else ei_next (&ei); @@ -7311,7 +7324,7 @@ pass_expand::execute (function *fun) In the future we should get this fixed properly. */ if ((e->flags & EDGE_ABNORMAL) && !(e->flags & EDGE_SIBCALL)) - remove_edge (e); + expand_remove_edge (e); else ei_next (&ei); } --- gcc/stmt.cc.jj 2025-04-08 14:09:00.253156547 +0200 +++ gcc/stmt.cc 2025-06-09 15:18:30.714708828 +0200 @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. #include "tree-cfg.h" #include "dumpfile.h" #include "builtins.h" +#include "cfgexpand.h" /* Functions and data structures for expanding case statements. */ @@ -1025,7 +1026,7 @@ expand_case (gswitch *stmt) && gimple_seq_unreachable_p (bb_seq (default_edge->dest))) { default_label = NULL; - remove_edge (default_edge); + expand_remove_edge (default_edge); default_edge = NULL;
Re: [PATCH v6 1/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-*warning messages due to code movements from compiler transformation (Part 1) [PR109071,PR85788,PR88771,PR106762,PR1
> On Jun 6, 2025, at 03:31, Richard Biener wrote: > > On Fri, May 30, 2025 at 5:13 PM Qing Zhao wrote: >> >> Hi, Richard, >> >> Really appreciate for your suggestions. >> >>> On May 30, 2025, at 05:22, Richard Biener >>> wrote: >>> >>> On Fri, May 23, 2025 at 10:49 PM Qing Zhao wrote: Hi, Richard, Thanks a lot for your comments and questions. Please see my answers embedded below: > On May 19, 2025, at 06:44, Richard Biener > wrote: > > On Fri, May 16, 2025 at 3:34 PM Qing Zhao wrote: >> >> Control this with a new option -fdiagnostics-details. >> >> $ cat t.c >> extern void warn(void); >> static inline void assign(int val, int *regs, int *index) >> { >> if (*index >= 4) >> warn(); >> *regs = val; >> } >> struct nums {int vals[4];}; >> >> void sparx5_set (int *ptr, struct nums *sg, int index) >> { >> int *val = &sg->vals[index]; >> >> assign(0,ptr, &index); >> assign(*val, ptr, &index); >> } >> >> $ gcc -Wall -O2 -c -o t.o t.c >> t.c: In function ‘sparx5_set’: >> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ >> [-Warray-bounds=] >> 12 | int *val = &sg->vals[index]; >>| ^~~ >> t.c:8:18: note: while referencing ‘vals’ >> 8 | struct nums {int vals[4];}; >>| ^~~~ >> >> In the above, Although the warning is correct in theory, the warning >> message >> itself is confusing to the end-user since there is information that >> cannot >> be connected to the source code directly. >> >> It will be a nice improvement to add more information in the warning >> message >> to report where such index value come from. >> >> In order to achieve this, we add a new data structure "move_history" to >> record >> 1. the "condition" that triggers the code movement; >> 2. whether the code movement is on the true path of the "condition"; >> 3. the "compiler transformation" that triggers the code movement. >> >> Whenever there is a code movement along control flow graph due to some >> specific transformations, such as jump threading, path isolation, tree >> sinking, etc., a move_history structure is created and attached to the >> moved gimple statement. >> >> During array out-of-bound checking or -Wstringop-* warning checking, the >> "move_history" that was attached to the gimple statement is used to form >> a sequence of diagnostic events that are added to the corresponding rich >> location to be used to report the warning message. >> >> This behavior is controled by the new option -fdiagnostics-details >> which is off by default. >> >> With this change, by adding -fdiagnostics-details, >> the warning message for the above testing case is now: >> >> $ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c >> t.c: In function ‘sparx5_set’: >> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ >> [-Warray-bounds=] >> 12 | int *val = &sg->vals[index]; >>| ^~~ >> ‘sparx5_set’: events 1-2 >> 4 | if (*index >= 4) >>| ^ >>| | >>| (1) when the condition is evaluated to true >> .. >> 12 | int *val = &sg->vals[index]; >>| ~~~ >>| | >>| (2) out of array bounds here >> t.c:8:18: note: while referencing ‘vals’ >> 8 | struct nums {int vals[4];}; >>| ^~~~ >> >> The change was divided into 3 parts: >> >> Part 1: Add new data structure move_history, record move_history during >> transformation; >> Part 2: In warning analysis, Use the new move_history to form a rich >> location with a sequence of events, to report more context info >> of the warnings. >> Part 3: Add debugging mechanism for move_history. > > Thanks for working on this. I'm pasting my review notes on [1/3] below. > > > ove histories are allocated from a global obstack - it seems they are > never released, just the association between stmt and history is > eventually broken by remove_move_history? They are supposed to be released by the following routine: +void +move_history_finalize (void) +{ + if (move_history_map) +{ + delete move_history_map; + move_history_map = NULL; +} + obstack_free (&move_history_obstack, NULL); + return; +} Is the above “obstack_free” the proper utility routine that are used to free the whole memory allocated for the“move_history_obstack”? If not, please advise on how to free the obsta
[PATCH] c++, coroutines: Ensure that the resumer is marked as can_throw.
I was planning to apply this as obvious - but it is needed for the next patch to be posted - so noting here now. I discussed with one of the original coroutines paper authors the idea that, if the original function was marked noexcept, then we should carry that onto the outlined body. This was not the design intent, so it is quite expected that the ramp might be noexcept - but that a resumer via a handle could still receive an exception. Given this and the fact we unconditionally add exceptions code to the body, I consider this patch to be obvious and the EH code certainly gets very confused without it (fun to debug). Any comments? thanks Iain --- 8<--- We must flag that the resumer might throw (since the wrapping of the original function body unconditionally adds a try-catch/rethrow). We also add code that might throw - even when the original function body would not. gcc/cp/ChangeLog: * coroutines.cc (build_actor_fn): Set can_throw. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 18c0a4812c4..d82fa46f208 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -2412,6 +2412,9 @@ build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody, bool spf = start_preparsed_function (actor, NULL_TREE, SF_PRE_PARSED); gcc_checking_assert (spf); gcc_checking_assert (cfun && current_function_decl && TREE_STATIC (actor)); + if (flag_exceptions) +/* We, unconditionally, add a try/catch and rethrow. */ +cp_function_chain->can_throw = true; tree stmt = begin_function_body (); tree actor_bind = build3 (BIND_EXPR, void_type_node, NULL, NULL, NULL); -- 2.39.2 (Apple Git-143)
Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
On Mon, Jun 09, 2025 at 04:42:49PM +0100, Richard Sandiford wrote: > Stafford Horne writes: > > On Fri, Jun 06, 2025 at 04:41:21PM +0100, Stafford Horne wrote: > >> On Fri, Jun 06, 2025 at 04:20:10PM +0100, Richard Sandiford wrote: > >> > Stafford Horne writes: > >> > > Hello, > >> > > > >> > > This seems to be causing a build regression on the or1k port. > >> > > > >> > > I have not quite figured it out yet but I have bisected to this commit. > >> > > > >> > > The failure is as below, this seems to be caused by the cstoresi4 > >> > > instruction produced by > >> > > or1k.md. So I think its likely something we are doing funny in the > >> > > port. > >> > > > >> > > Thanks to Jeff for pointing this out. If you have any idea let me > >> > > know. > >> > > > >> > > Log: > >> > > > >> > > during RTL pass: ce1 > >> > > dump file: libgcc2.c.286r.ce1 > >> > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c: In function > >> > > '__muldi3': > >> > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c:538:1: > >> > > internal compiler error: in emit_move_multi_word, at expr.cc:4492 > >> > > 538 | } > >> > > | ^ > >> > > 0x1b094df internal_error(char const*, ...) > >> > > > >> > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic-global-context.cc:517 > >> > > 0x6459c1 fancy_abort(char const*, int, char const*) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic.cc:1815 > >> > > 0x473a2e emit_move_multi_word > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4492 > >> > > 0x93fd7d emit_move_insn(rtx_def*, rtx_def*) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4746 > >> > > 0x9175f1 copy_to_reg(rtx_def*) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/explow.cc:630 > >> > > 0xd18977 gen_lowpart_general(machine_mode, rtx_def*) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/rtlhooks.cc:56 > >> > > 0x9414c7 convert_mode_scalar > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:818 > >> > > 0x9421a1 convert_modes(machine_mode, machine_mode, rtx_def*, int) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:961 > >> > > 0xc0afa0 prepare_operand(insn_code, rtx_def*, int, machine_mode, > >> > > machine_mode, int) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4637 > >> > > 0xc0b33a prepare_cmp_insn > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4538 > >> > > 0xc0f0dd emit_conditional_move(rtx_def*, rtx_comparison, rtx_def*, > >> > > rtx_def*, machine_mode, int) > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:5098 > >> > > 0x192cc7b noce_emit_cmove > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:1777 > >> > > 0x193365b try_emit_cmove_seq > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3410 > >> > > 0x193365b noce_convert_multiple_sets_1 > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3705 > >> > > 0x1933c71 noce_convert_multiple_sets > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3496 > >> > > 0x1938687 noce_process_if_block > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4045 > >> > > 0x1938687 noce_find_if_block > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4726 > >> > > 0x1938687 find_if_header > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4931 > >> > > 0x1938687 if_convert > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6078 > >> > > 0x1938d01 rest_of_handle_if_conversion > >> > >/home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6143 > >> > > > >> > > -Stafford > >> > > >> > Can you attach the .i? > >> > >> Sure please find attached, I can reproduce with the or1k compiler using: > >> > >> $HOME/work/gnu-toolchain/build-gcc/./gcc/cc1 -O2 __muldi3.i > >> > >> I am still debugging, but I notice when reverting the patch the compiler > >> never > >> goes to emit_move_multi_word(). I am trying to work backwards as to where > >> it goes > >> different, it's just taking a bit of time as I haven't debugged GCC ICE's > >> for a > >> while. > > > > I made some progress on this. In openRISC we generate cstoresi4 with > > special > > register SR_F (condition flag) bit register. > > > > This is represented as (reg:BI 34 ?sr_f). > > > > The RTL produced for cstoresi4 becomes something like: > > > > // The compare > > (insn 49 48 50 (set (reg:BI 34 ?sr_f) > > (leu:BI (reg/v:SI 68 [ __x2D.6783 ]) > > (reg/v:SI 69 [ __x1D.6782 ]))) > > "/home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c":532:22 -1 > >(nil) > > // The conditional branch > > (jump_insn 50 49 0 (set (pc) > > (if_then_else (ne (reg:BI 34 ?sr_f) > > (const_int 0 [0])) > >
Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern [combine question]
On 4/1/25 9:35 PM, Jeff Law wrote: Segher -- there's a combine question near the end... On 3/23/25 8:43 PM, Bohan Lei wrote: The combine pass can generate an index like (and:DI (mult:DI (reg:DI) (const_int scale)) (const_int mask)) when XTheadMemIdx is available. LRA may pull it out, and thus a splitter is needed when Zba is not available. A similar splitter were introduced when XTheadMemIdx support was added, but removed in commit 31c3c5d. The new splitter in this new patch is based on the removed one. gcc/ChangeLog: * config/riscv/thead.md (*th_memidx_operand): New splitter. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadmemidx-bug.c: New test. So returning to this So the combine pass doesn't reject combination into an ASM_INPUT, just combination from most ASM_INPUTs. It does rely on predicates to determine validity. So my suspicion is that the memory_operand (or one of the closely related) predicates is returning true for an addressing mode is that is not always valid. Or to put it another way, addressing mode validity is based on two properties. The form of the address and the mode of the memory reference. If any other context is needed to determine validity, then that addressing format must not be considered valid for the given memory access mode. I hope I didn't steer you the wrong way on that issue Christoph! So at best this splitter is still just papering over a bigger problem that needs to be fixed. Jeff
Re: [PATCH, FYI?] [arm] [vxworks] require thumb2 for pr120424.C
On 08/06/2025 14:15, Alexandre Oliva wrote: > > VxWorks kernel mode doesn't support thumb code, so the test fails. > Require thumb2 support. You already have -march=armv7, so that implies any thumb code will be thumb2. So this doesn't really make sense as this is a compile-only test. Furthermore, this will disable the test entirely if generating Arm code. Is that really what you want? > > Tested with arm-vx7r2 on gcc-14, where this came up, and with arm-eabi > on trunk. Absent objections, I'm going to install this as obvious in > the next few days. > > > for gcc/testsuite/ChangeLog > > * g++.target/arm/pr120424.C: Require thumb2 support. > --- > gcc/testsuite/g++.target/arm/pr120424.C |1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/testsuite/g++.target/arm/pr120424.C > b/gcc/testsuite/g++.target/arm/pr120424.C > index 4d0e49013c04a..620089ab56225 100644 > --- a/gcc/testsuite/g++.target/arm/pr120424.C > +++ b/gcc/testsuite/g++.target/arm/pr120424.C > @@ -1,4 +1,5 @@ > /* { dg-do compile } */ > +/* { dg-require-effective-target arm_thumb2_ok } */ > /* { dg-options "-march=armv7 -O2 -fstack-clash-protection > -fnon-call-exceptions" } */ > /* { dg-final { scan-assembler-not {#-8} } } */ > /* LRA register elimination gets confused when register spilling > >
Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
On Sun, Jun 08, 2025 at 09:05:07AM -0600, Jeff Law wrote: > > > On 6/7/25 4:38 AM, Stafford Horne wrote: > > > > > ## Note on cstoresi4 and PUT_MODE. > > > > There was some concern raised by Jeff about the use of PUT_MODE in > > cstoresi4. > > This is here to convert a 'ne' to 'ne:SI', for example, allowing the > > generated > > RTL to match with the "*scc" insn pattern later. > > > > It seems to be working OK so I will leave this as is for now. But if you > > have > > any suggestions let me know. > Well, it's a sign that something is wrong. Patterns shouldn't be changing > the mode or code of an existing RTL node (say that 5 fast :-) Understood, just some explaination, in this case we are updating RTL that was just generated by or1k_expand_compare().. /* Adjust the operands for use in the caller. */ operands[0] = flag_check_ne ? gen_rtx_NE (VOIDmode, sr_f, const0_rtx) : gen_rtx_EQ (VOIDmode, sr_f, const0_rtx); operands[1] = sr_f; operands[2] = const0_rtx; } Depending on the pattern we are expanding we want either: (ne (reg:BI 34 ?sr_f)(const_int 0 [0])) Or: (ne:SI (reg:BI 34 ?sr_f)(const_int 0 [0])) The PUT_MODE does the final adjustment. As a cleanup, the PUT_MODE could be moved into or1k_expand_compare. > I don't mind deferring since apparently it's not playing a role in this > issue, but that code is definitely very fishy. OK. -Stafford
[PATCH] c++: Save 8 further bytes from lang_type allocations
Hi! The following patch implements the /* FIXME reuse another field? */ comment on the lambda_expr member. I think (and asserts in the patch seem to confirm) CLASSTYPE_KEY_METHOD is only ever non-NULL for TYE_POLYMORPHIC_P and on the other side CLASSTYPE_LAMBDA_EXPR is only used on closure types which are never polymorphic. So, the patch just uses one member for both, with the accessor macros changed to be no longer lvalues and adding SET_* variants of the macros for setters. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-06-09 Jakub Jelinek * cp-tree.h (struct lang_type): Add comment before key_method. Remove lambda_expr. (CLASSTYPE_KEY_METHOD): Give NULL_TREE if not TYPE_POLYMORPHIC_P. (SET_CLASSTYPE_KEY_METHOD): Define. (CLASSTYPE_LAMBDA_EXPR): Give NULL_TREE if TYPE_POLYMORPHIC_P. Use key_method member instead of lambda_expr. (SET_CLASSTYPE_LAMBDA_EXPR): Define. * class.cc (determine_key_method): Use SET_CLASSTYPE_KEY_METHOD macro. * decl.cc (xref_tag): Use SET_CLASSTYPE_LAMBDA_EXPR macro. * lambda.cc (begin_lambda_type): Likewise. * module.cc (trees_in::read_class_def): Use SET_CLASSTYPE_LAMBDA_EXPR and SET_CLASSTYPE_KEY_METHOD macros, assert lambda is NULL if TYPE_POLYMORPHIC_P and otherwise assert key_method is NULL. --- gcc/cp/cp-tree.h.jj 2025-06-09 12:51:02.905724560 +0200 +++ gcc/cp/cp-tree.h2025-06-09 14:09:45.880851279 +0200 @@ -2511,11 +2511,11 @@ struct GTY(()) lang_type { vec *pure_virtuals; tree friend_classes; vec * GTY((reorder ("resort_type_member_vec"))) members; + /* CLASSTYPE_KEY_METHOD for TYPE_POLYMORPHIC_P types, CLASSTYPE_LAMBDA_EXPR + otherwise. */ tree key_method; tree decl_list; tree befriending_classes; - /* FIXME reuse another field? */ - tree lambda_expr; union maybe_objc_info { /* If not c_dialect_objc, this part is not even allocated. */ char GTY((tag ("0"))) non_objc; @@ -2638,7 +2638,13 @@ struct GTY(()) lang_type { /* The member function with which the vtable will be emitted: the first noninline non-pure-virtual member function. NULL_TREE if there is no key function or if this is a class template */ -#define CLASSTYPE_KEY_METHOD(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->key_method) +#define CLASSTYPE_KEY_METHOD(NODE) \ + (TYPE_POLYMORPHIC_P (NODE) \ + ? LANG_TYPE_CLASS_CHECK (NODE)->key_method \ + : NULL_TREE) +#define SET_CLASSTYPE_KEY_METHOD(NODE, VALUE) \ + (gcc_checking_assert (TYPE_POLYMORPHIC_P (NODE)),\ + LANG_TYPE_CLASS_CHECK (NODE)->key_method = (VALUE)) /* Vector of members. During definition, it is unordered and only member functions are present. After completion it is sorted and @@ -2770,7 +2776,12 @@ struct GTY(()) lang_type { /* The associated LAMBDA_EXPR that made this class. */ #define CLASSTYPE_LAMBDA_EXPR(NODE) \ - (LANG_TYPE_CLASS_CHECK (NODE)->lambda_expr) + (TYPE_POLYMORPHIC_P (NODE) \ + ? NULL_TREE \ + : LANG_TYPE_CLASS_CHECK (NODE)->key_method) +#define SET_CLASSTYPE_LAMBDA_EXPR(NODE, VALUE) \ + (gcc_checking_assert (!TYPE_POLYMORPHIC_P (NODE)), \ + LANG_TYPE_CLASS_CHECK (NODE)->key_method = (VALUE)) /* The extra mangling scope for this closure type. */ #define LAMBDA_TYPE_EXTRA_SCOPE(NODE) \ (LAMBDA_EXPR_EXTRA_SCOPE (CLASSTYPE_LAMBDA_EXPR (NODE))) --- gcc/cp/class.cc.jj 2025-05-20 08:14:05.551417960 +0200 +++ gcc/cp/class.cc 2025-06-09 14:05:45.196103702 +0200 @@ -7445,7 +7445,7 @@ determine_key_method (tree type) && ! DECL_DECLARED_INLINE_P (method) && ! DECL_PURE_VIRTUAL_P (method)) { - CLASSTYPE_KEY_METHOD (type) = method; + SET_CLASSTYPE_KEY_METHOD (type, method); break; } --- gcc/cp/decl.cc.jj 2025-06-02 11:00:14.802413264 +0200 +++ gcc/cp/decl.cc 2025-06-09 14:08:44.494680807 +0200 @@ -17251,7 +17251,7 @@ xref_tag (enum tag_types tag_code, tree if (IDENTIFIER_LAMBDA_P (name)) /* Mark it as a lambda type right now. Our caller will correct the value. */ - CLASSTYPE_LAMBDA_EXPR (t) = error_mark_node; + SET_CLASSTYPE_LAMBDA_EXPR (t, error_mark_node); t = pushtag (name, t, how); } else --- gcc/cp/lambda.cc.jj 2025-06-02 11:00:14.808413187 +0200 +++ gcc/cp/lambda.cc2025-06-09 14:04:53.458802707 +0200 @@ -150,7 +150,7 @@ begin_lambda_type (tree lambda) /* Cross-reference the expression and the type. */ LAMBDA_EXPR_CLOSURE (lambda) = type; - CLASSTYPE_LAMBDA_EXPR (type) = lambda; + SET_CLASSTYPE_LAMBDA_EXPR (type, lambda); /* In C++17, assume the closure is literal; we'll clear the flag later if necessary. */ --- gcc/cp/module.cc.jj 2025-06-02 11:00:14.835412837 +0200 +++ gcc/cp/module.cc2025-06-09 14:08:09.107159009 +0200 @@ -13243,13 +132
[PATCH] c++, v2: Don't incorrectly reject override after class head name [PR120569]
On Mon, Jun 09, 2025 at 12:17:12PM -0400, Jason Merrill wrote: > > While the > > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2786r13.html#c03-compatibility-changes-for-annex-c-diff.cpp03.dcl.dcl > > hunk dropped because > > struct C {}; struct C {} final {}; > > is actually not valid C++98 (which didn't have list initialization), we > > actually also reject > > struct D {}; struct D {} override {}; > > and that IMHO is valid all the way from C++11 onwards. > > I don't see how it could be, and other compilers agree with me. Since the > testcases don't put {} between the class name and the contextual keyword, I > think this is just a typo in the comment. Oops, sure, consider the second {} pair removed. Sorry. > > Especially in the light of P2786R13 adding new contextual keywords, I think > > it is better to use a separate routine for parsing the > > class-virt-specifier-seq (in C++11, there was export next to final), > > class-virt-specifier (in C++14 to C++23) and > > class-property-specifier-seq (in C++26) instead of using the same function > > for virt-specifier-seq and class-property-specifier-seq. > > > > Tested on x86_64-linux and i686-linux, ok for trunk? > > > > 2025-06-06 Jakub Jelinek > > > > PR c++/120569 > > * parser.cc (cp_parser_class_virt_specifier_seq_opt): New function. > > Let's refer to the new non-terminal rather than the old one, both in the > name and comment of this function. That was actually my first version of the patch, but thought it would be weird to talk about C++26 name where the support isn't even in. Anyway, changed now. Is it ok like that or do you want also to rename the virt_specifiers and virt_specifier automatic variables and perhaps use a different enum? > > + if (virt_specifiers & virt_specifier) > > + { > > + gcc_rich_location richloc (token->location); > > + richloc.add_fixit_remove (); > > + error_at (&richloc, "duplicate class-virt-specifier"); > > And here let's name the actual duplicate specifier rather refer to the > grammar. Changed. Though if this is ever backported, perhaps it should use the same diagnostics as before in the backports. So like this? So far lightly tested. 2025-06-09 Jakub Jelinek PR c++/120569 * parser.cc (cp_parser_class_property_specifier_seq_opt): New function. (cp_parser_class_head): Use it instead of cp_parser_property_specifier_seq_opt. Don't diagnose VIRT_SPEC_OVERRIDE here. Formatting fix. * g++.dg/cpp0x/override2.C: Expect different diagnostics with override or duplicate final. * g++.dg/cpp0x/override5.C: New test. * g++.dg/cpp0x/duplicate1.C: Expect different diagnostics with duplicate final. --- gcc/cp/parser.cc.jj 2025-06-02 11:00:14.0 +0200 +++ gcc/cp/parser.cc2025-06-06 17:20:16.344079071 +0200 @@ -28005,6 +28005,57 @@ cp_parser_class_specifier (cp_parser* pa return type; } +/* Parse an (optional) class-property-specifier-seq. + + class-property-specifier-seq: + class-property-specifier class-property-specifier-seq [opt] + + class-property-specifier: + final + + Returns a bitmask representing the class-property-specifiers. */ + +static cp_virt_specifiers +cp_parser_class_property_specifier_seq_opt (cp_parser *parser) +{ + cp_virt_specifiers virt_specifiers = VIRT_SPEC_UNSPECIFIED; + + while (true) +{ + cp_token *token; + cp_virt_specifiers virt_specifier; + + /* Peek at the next token. */ + token = cp_lexer_peek_token (parser->lexer); + /* See if it's a class-property-specifier. */ + if (token->type != CPP_NAME) + break; + if (id_equal (token->u.value, "final")) + { + maybe_warn_cpp0x (CPP0X_OVERRIDE_CONTROLS); + virt_specifier = VIRT_SPEC_FINAL; + } + else if (id_equal (token->u.value, "__final")) + virt_specifier = VIRT_SPEC_FINAL; + else + break; + + if (virt_specifiers & virt_specifier) + { + gcc_rich_location richloc (token->location); + richloc.add_fixit_remove (); + error_at (&richloc, "duplicate %qD specifier", token->u.value); + cp_lexer_purge_token (parser->lexer); + } + else + { + cp_lexer_consume_token (parser->lexer); + virt_specifiers |= virt_specifier; + } +} + return virt_specifiers; +} + /* Parse a class-head. class-head: @@ -28195,12 +28246,10 @@ cp_parser_class_head (cp_parser* parser, pop_deferring_access_checks (); if (id) -{ - cp_parser_check_for_invalid_template_id (parser, id, - class_key, - type_start_token->location); -} - virt_specifiers = cp_parser_virt_specifier_seq_opt (parser); +cp_parser_check_for_invalid_template_id (parser, id, +class_key, +
[PATCH] c, c++: Save 8 bytes of memory in lang_type for non-ObjC*
Hi! For C++26 P2786R13 I'm afraid I'll need 4 new flags on class types in struct lang_type (1 bit for trivially_relocatable_if_eligible, 1 for replaceable_if_eligible, 1 for not_trivially_relocatable and 1 for not_replaceable) and there are just 2 bits left. The following patch is an attempt to save 8 bytes of memory in those structures when not compiling ObjC or ObjC++ (I think those are used fairly rarely and the patch keeps the sizes unmodified for those 2). The old allocations were 32 bytes for C and 120 bytes for C++. The patch moves the objc_info member last in the C++ case (it was already last in the C case), arranges for GC to skip it for C and C++ but walk for ObjC and ObjC++ and allocates or copies over just offsetof bytes instead of sizeof. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-06-09 Jakub Jelinek gcc/c/ * c-lang.h (union lang_type::maybe_objc_info): New type. (struct lang_type): Use union maybe_objc_info info member instead of tree objc_info. * c-decl.cc (finish_struct): Allocate struct lang_type using ggc_internal_cleared_alloc instead of ggc_cleared_alloc, and use sizeof (struct lang_type) for ObjC and otherwise offsetof (struct lang_type, info) as size. (finish_enum): Likewise. gcc/cp/ * cp-tree.h (union lang_type::maybe_objc_info): New type. (struct lang_type): Use union maybe_objc_info info member instead of tree objc_info. * lex.cc (copy_lang_type): Use sizeof (struct lang_type) just for ObjC++ and otherwise offsetof (struct lang_type, info). (maybe_add_lang_type_raw): Likewise. (cxx_make_type): Formatting fix. gcc/objc/ * objc-act.h (TYPE_OBJC_INFO): Define to info.objc_info instead of objc_info. gcc/objcp/ * objcp-decl.h (TYPE_OBJC_INFO): Define to info.objc_info instead of objc_info. --- gcc/c/c-lang.h.jj 2025-04-08 14:08:48.398321583 +0200 +++ gcc/c/c-lang.h 2025-06-09 12:50:21.631284689 +0200 @@ -35,10 +35,14 @@ struct GTY(()) lang_type { /* In an ENUMERAL_TYPE, the min and max values. */ tree enum_min; tree enum_max; - /* In a RECORD_TYPE, information specific to Objective-C, such - as a list of adopted protocols or a pointer to a corresponding - @interface. See objc/objc-act.h for details. */ - tree objc_info; + union maybe_objc_info { +/* If not c_dialect_objc, this part is not even allocated. */ +char GTY((tag ("0"))) non_objc; +/* In a RECORD_TYPE, information specific to Objective-C, such + as a list of adopted protocols or a pointer to a corresponding + @interface. See objc/objc-act.h for details. */ +tree GTY((tag ("1"))) objc_info; + } GTY ((desc ("c_dialect_objc ()"))) info; }; struct GTY(()) lang_decl { --- gcc/c/c-decl.cc.jj 2025-06-04 17:21:01.111783263 +0200 +++ gcc/c/c-decl.cc 2025-06-09 13:10:56.724561444 +0200 @@ -9790,12 +9790,17 @@ finish_struct (location_t loc, tree t, t len += list_length (x); /* Use the same allocation policy here that make_node uses, to - ensure that this lives as long as the rest of the struct decl. - All decls in an inline function need to be saved. */ + ensure that this lives as long as the rest of the struct decl. + All decls in an inline function need to be saved. */ - space = ggc_cleared_alloc (); - space2 = (sorted_fields_type *) ggc_internal_alloc - (sizeof (struct sorted_fields_type) + len * sizeof (tree)); + space = ((struct lang_type *) +ggc_internal_cleared_alloc (c_dialect_objc () +? sizeof (struct lang_type) +: offsetof (struct lang_type, +info))); + space2 = ((sorted_fields_type *) + ggc_internal_alloc (sizeof (struct sorted_fields_type) + + len * sizeof (tree))); len = 0; space->s = space2; @@ -10269,7 +10274,10 @@ finish_enum (tree enumtype, tree values, /* Record the min/max values so that we can warn about bit-field enumerations that are too small for the values. */ - lt = ggc_cleared_alloc (); + lt = ((struct lang_type *) + ggc_internal_cleared_alloc (c_dialect_objc () + ? sizeof (struct lang_type) + : offsetof (struct lang_type, info))); lt->enum_min = minnode; lt->enum_max = maxnode; TYPE_LANG_SPECIFIC (enumtype) = lt; --- gcc/cp/cp-tree.h.jj 2025-06-07 09:46:31.602393864 +0200 +++ gcc/cp/cp-tree.h2025-06-09 12:51:02.905724560 +0200 @@ -2514,12 +2514,16 @@ struct GTY(()) lang_type { tree key_method; tree decl_list; tree befriending_classes; - /* In a RECORD_TYPE, information specific to Objective-C++, such - as a list of adopted prot
[PATCH] libstdc++: Implement LWG3528 make_from_tuple can perform (the equivalent of) a C-style cast
From: Yihan Wang Implement LWG3528 to make std::make_from_tuple SFINAE friendly. libstdc++-v3/ChangeLog: * include/std/tuple (__can_make_from_tuple): New variable template. (__make_from_tuple_impl): Add static_assert. (make_from_tuple): Constrain using __can_make_from_tuple. * testsuite/20_util/tuple/dr3528.cc: New test. Signed-off-by: Yihan Wang Co-authored-by: Jonathan Wakely --- Here's a simpler version of Yihan's patch, with constraints on std::make_from_tuple but not on __make_from_tuple_impl, and with an original test rather than one copied from MSVC and libc++. Lightly tested on x86_64-linux, full testing underway. libstdc++-v3/include/std/tuple| 24 +- .../testsuite/20_util/tuple/dr3528.cc | 44 +++ 2 files changed, 66 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/tuple/dr3528.cc diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple index 2e69af13a98b..b39ce710984c 100644 --- a/libstdc++-v3/include/std/tuple +++ b/libstdc++-v3/include/std/tuple @@ -2939,19 +2939,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #ifdef __cpp_lib_make_from_tuple // C++ >= 17 + template >>> +constexpr bool __can_make_from_tuple = false; + + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3528. make_from_tuple can perform (the equivalent of) a C-style cast + template +constexpr bool __can_make_from_tuple<_Tp, _Tuple, index_sequence<_Idx...>> + = is_constructible_v<_Tp, + decltype(std::get<_Idx>(std::declval<_Tuple>()))...>; + template constexpr _Tp __make_from_tuple_impl(_Tuple&& __t, index_sequence<_Idx...>) -{ return _Tp(std::get<_Idx>(std::forward<_Tuple>(__t))...); } +{ + static_assert(__can_make_from_tuple<_Tp, _Tuple, index_sequence<_Idx...>>); + return _Tp(std::get<_Idx>(std::forward<_Tuple>(__t))...); +} #if __cpp_lib_tuple_like // >= C++23 template #else template #endif -constexpr _Tp +constexpr auto make_from_tuple(_Tuple&& __t) noexcept(__unpack_std_tuple) +#ifdef __cpp_concepts // >= C++20 +-> _Tp +requires __can_make_from_tuple<_Tp, _Tuple> +#else +-> __enable_if_t<__can_make_from_tuple<_Tp, _Tuple>, _Tp> +#endif { constexpr size_t __n = tuple_size_v>; #if __has_builtin(__reference_constructs_from_temporary) diff --git a/libstdc++-v3/testsuite/20_util/tuple/dr3528.cc b/libstdc++-v3/testsuite/20_util/tuple/dr3528.cc new file mode 100644 index ..a96c6b75e209 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/tuple/dr3528.cc @@ -0,0 +1,44 @@ +// { dg-do compile { target c++17 } } + +// LWG 3528. make_from_tuple can perform (the equivalent of) a C-style cast + +#include +#include +#include + +template +using make_t = decltype(std::make_from_tuple(std::declval())); + +template +constexpr bool can_make = false; +template +constexpr bool can_make>> = true; + +static_assert( can_make> ); +static_assert( can_make&> ); +static_assert( can_make&> ); +static_assert( can_make> ); +static_assert( can_make&&> ); +static_assert( can_make, std::pair> ); +static_assert( can_make> ); +static_assert( can_make> ); +static_assert( can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make&> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); + +struct Two +{ + Two(const char*, int); +}; + +static_assert( can_make> ); +static_assert( ! can_make> ); +static_assert( can_make> ); +static_assert( ! can_make> ); -- 2.49.0
Re: [PATCH] emit-rtl: Use simplify_subreg_regno to validate hardware subregs [PR119966]
On Sun, Jun 08, 2025 at 09:09:44AM -0600, Jeff Law wrote: > > > On 6/5/25 2:16 PM, Dimitar Dimitrov wrote: > > PR119966 showed that combine could generate unfoldable hardware subregs > > for pru-unknown-elf. To fix, strengthen the checks performed by > > validate_subreg. > > > > The simplify_subreg_regno performs more validity checks than > > the simple info.representable_p. Most importantly, the > > targetm.hard_regno_mode_ok hook is called to ensure the hardware > > register is valid in subreg's outer mode. This fixes the rootcause > > for PR119966. > > > > The checks for stack-related registers are bypassed because the i386 > > backend generates them, in this seemingly valid peephole optimization: > > > > ;; Attempt to always use XOR for zeroing registers (including FP modes). > > (define_peephole2 > > [(set (match_operand 0 "general_reg_operand") > > (match_operand 1 "const0_operand"))] > > "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD > >&& (! TARGET_USE_MOV0 || optimize_insn_for_size_p ()) > >&& peep2_regno_dead_p (0, FLAGS_REG)" > > [(parallel [(set (match_dup 0) (const_int 0)) > > (clobber (reg:CC FLAGS_REG))])] > > "operands[0] = gen_lowpart (word_mode, operands[0]);") > > > > Testing done: > >* No regressions were detected for C and C++ on x86_64-pc-linux-gnu. > >* "contrib/compare-all-tests i386" showed no difference in code > > generation. > >* No regressions for pru-unknown-elf. > >* Reverted r16-809-gf725d6765373f7 to expose the now latent PR119966. > > Then ensured pru-unknown-elf build is ok. Only two cases regressed > > where rnreg pass transforms a valid hardware subreg into invalid > > one. But I think that is not related to combine's PR119966: > >gcc.c-torture/execute/20040709-1.c > >gcc.c-torture/execute/20040709-2.c > > > > This patch was provisionally approved in: > >https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685492.html > > I'll wait for 2 days to get pre-commit CI results, then will commit it. > > > > PR target/119966 > > > > gcc/ChangeLog: > > > > * emit-rtl.cc (validate_subreg): Call simplify_subreg_regno > > instead of checking info.representable_p.. > > * rtl.h (simplify_subreg_regno): Add new argument > > allow_stack_regs. > > * rtlanal.cc (simplify_subreg_regno): Do not reject > > stack-related registers if allow_stack_regs is true. > Note that my system is primarily post-commit unless I > explicitly throw in a patch for testing purposes. So > figure ~24hrs after the patch goes in we'll have the > vast majority of the coverage. The native emulated > targets will trickle in over the remainder of the week. > I did more testing. riscv32-unknown-elf and riscv64-unknown-linux-gnu are ok. But this patch breaks AVR due to the following workaround for old reload in avr_hard_regno_mode_ok: if (GET_MODE_SIZE (mode) >= 4 && regno >= REG_X // This problem only concerned the old reload. && ! avropt_lra_p) return false; The following insn causes a split for "zero_extendsidi2" to ICE because operands are replaced with output from simplify_gen_subreg, which now fails: (insn 17 16 18 2 (set (reg/v:DI 24 r24 [orig:44 lowD.4526 ] [44]) (zero_extend:DI (reg:SI 22 r22 [orig:66 aD.4519 ] [66]))) "/mnt/nvme/dinux/local-workspace/gcc/libgcc/fixed-bit.c":790:7 827 {zero_extendsidi2} (nil)) That should be fixed once AVR is switched to LRA (PR113934). Richard, Jeff, it does not seem appropriate to merge this patch now, given that it breaks avr and or1k. Let me know if such experiment is desired despite the known breakages. Regards, Dimitar > Jeff >
Re: [PATCH v6 4/8] libstdc++: Implement layout_right from mdspan.
If not committed yet, there's a style error, see below. On 6/4/25 16:58, Luc Grosheintz wrote: Implement the parts of layout_left that depend on layout_right; and the parts of layout_right that don't depend on layout_stride. libstdc++-v3/ChangeLog: * include/std/mdspan (layout_right): New class. * src/c++23/std.cc.in: Add layout_right. Signed-off-by: Luc Grosheintz --- libstdc++-v3/include/std/mdspan | 153 ++- libstdc++-v3/src/c++23/std.cc.in | 1 + 2 files changed, 153 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan index 4c80438dbd6..8ab7a1a52fb 100644 --- a/libstdc++-v3/include/std/mdspan +++ b/libstdc++-v3/include/std/mdspan @@ -409,6 +409,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class mapping; }; + struct layout_right + { +template + class mapping; + }; + namespace __mdspan { template @@ -501,7 +507,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Mapping>; template - concept __standardized_mapping = __mapping_of; + concept __standardized_mapping = __mapping_of + || __mapping_of; // A tag type to create internal ctors. class __internal_ctor @@ -539,6 +546,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : mapping(__other.extents(), __mdspan::__internal_ctor{}) { } + template + requires (extents_type::rank() <= 1) + && is_constructible_v + constexpr explicit(!is_convertible_v<_OExtents, extents_type>) + mapping(const layout_right::mapping<_OExtents>& __other) noexcept + : mapping(__other.extents(), __mdspan::__internal_ctor{}) + { } + constexpr mapping& operator=(const mapping&) noexcept = default; @@ -605,6 +620,142 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION [[no_unique_address]] extents_type _M_extents{}; }; + namespace __mdspan + { +template + constexpr typename _Extents::index_type + __linear_index_right(const _Extents& __exts, _Indices... __indices) + noexcept + { + using _IndexType = typename _Extents::index_type; + array<_IndexType, sizeof...(__indices)> __ind_arr{__indices...}; + _IndexType __res = 0; + if constexpr (sizeof...(__indices) > 0) + { + _IndexType __mult = 1; + auto __update = [&, __pos = __exts.rank()](_IndexType) mutable + { + --__pos; + __res += __ind_arr[__pos] * __mult; + __mult *= __exts.extent(__pos); + }; + (__update(__indices), ...); + } + return __res; + } + } + + template +class layout_right::mapping +{ +public: + using extents_type = _Extents; + using index_type = typename extents_type::index_type; + using size_type = typename extents_type::size_type; + using rank_type = typename extents_type::rank_type; + using layout_type = layout_right; + + static_assert(__mdspan::__representable_size, + "The size of extents_type must be representable as index_type"); + + constexpr + mapping() noexcept = default; + + constexpr + mapping(const mapping&) noexcept = default; + + constexpr + mapping(const extents_type& __extents) noexcept + : _M_extents(__extents) + { __glibcxx_assert(__mdspan::__is_representable_extents(_M_extents)); } + + template + requires is_constructible_v + constexpr explicit(!is_convertible_v<_OExtents, extents_type>) + mapping(const mapping<_OExtents>& __other) noexcept + : mapping(__other.extents(), __mdspan::__internal_ctor{}) + { } + + template s/template + requires (extents_type::rank() <= 1) + && is_constructible_v + constexpr explicit(!is_convertible_v<_OExtents, extents_type>) + mapping(const layout_left::mapping<_OExtents>& __other) noexcept + : mapping(__other.extents(), __mdspan::__internal_ctor{}) + { } + + constexpr mapping& + operator=(const mapping&) noexcept = default; + + constexpr const extents_type& + extents() const noexcept { return _M_extents; } + + constexpr index_type + required_span_size() const noexcept + { return __mdspan::__fwd_prod(_M_extents, extents_type::rank()); } + + template<__mdspan::__valid_index_type... _Indices> + requires (sizeof...(_Indices) == extents_type::rank()) + constexpr index_type + operator()(_Indices... __indices) const noexcept + { + return __mdspan::__linear_index_right( + _M_extents, static_cast(__indices)...); + } + + static constexpr bool + is_always_unique() noexcept + { return true; } + + static constexpr bool + is_always_exhaustive() noexcept + { return true; } + + static constexpr bool + is_always_strided() noexcep
Re: [PATCH v6 6/8] libstdc++: Implement layout_stride from mdspan.
Same style error. On 6/4/25 16:58, Luc Grosheintz wrote: Implements the remaining parts of layout_left and layout_right; and all of layout_stride. The implementation of layout_stride::mapping::is_exhaustive applies the following change to the standard: 4266. layout_stride::mapping should treat empty mappings as exhaustive https://cplusplus.github.io/LWG/issue4266 The preconditions for layout_stride(extents, strides) are not checked. libstdc++-v3/ChangeLog: * include/std/mdspan (layout_stride): New class. * src/c++23/std.cc.in: Add layout_stride. Signed-off-by: Luc Grosheintz --- libstdc++-v3/include/std/mdspan | 249 ++- libstdc++-v3/src/c++23/std.cc.in | 3 +- 2 files changed, 250 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan index 8ab7a1a52fb..fe182c35a55 100644 --- a/libstdc++-v3/include/std/mdspan +++ b/libstdc++-v3/include/std/mdspan @@ -415,6 +415,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class mapping; }; + struct layout_stride + { +template + class mapping; + }; + namespace __mdspan { template @@ -508,7 +514,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template concept __standardized_mapping = __mapping_of - || __mapping_of; + || __mapping_of + || __mapping_of; // A tag type to create internal ctors. class __internal_ctor @@ -554,6 +561,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : mapping(__other.extents(), __mdspan::__internal_ctor{}) { } + template + requires is_constructible_v + constexpr explicit(extents_type::rank() > 0) + mapping(const layout_stride::mapping<_OExtents>& __other) + : mapping(__other.extents(), __mdspan::__internal_ctor{}) + { __glibcxx_assert(*this == __other); } + constexpr mapping& operator=(const mapping&) noexcept = default; @@ -684,6 +698,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : mapping(__other.extents(), __mdspan::__internal_ctor{}) { } + template s/class/typename/ + requires is_constructible_v + constexpr explicit(extents_type::rank() > 0) + mapping(const layout_stride::mapping<_OExtents>& __other) noexcept + : mapping(__other.extents(), __mdspan::__internal_ctor{}) + { __glibcxx_assert(*this == __other); } + constexpr mapping& operator=(const mapping&) noexcept = default; @@ -756,6 +777,232 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION [[no_unique_address]] extents_type _M_extents{}; }; + namespace __mdspan + { +template + concept __mapping_alike = requires + { + requires __is_extents; + { M::is_always_strided() } -> same_as; + { M::is_always_exhaustive() } -> same_as; + { M::is_always_unique() } -> same_as; + bool_constant::value; + bool_constant::value; + bool_constant::value; + }; + +template + constexpr typename _Mapping::index_type + __offset(const _Mapping& __m) noexcept + { + using _IndexType = typename _Mapping::index_type; + constexpr auto __rank = _Mapping::extents_type::rank(); + + if constexpr (__standardized_mapping<_Mapping>) + return 0; + else if (__empty(__m.extents())) + return 0; + else + { + auto __impl = [&__m](index_sequence<_Counts...>) + { return __m(((void) _Counts, _IndexType(0))...); }; + return __impl(make_index_sequence<__rank>()); + } + } + +template + constexpr typename _Mapping::index_type + __linear_index_strides(const _Mapping& __m, _Indices... __indices) + noexcept + { + using _IndexType = typename _Mapping::index_type; + _IndexType __res = 0; + if constexpr (sizeof...(__indices) > 0) + { + auto __update = [&, __pos = 0u](_IndexType __idx) mutable + { + __res += __idx * __m.stride(__pos++); + }; + (__update(__indices), ...); + } + return __res; + } + } + + template +class layout_stride::mapping +{ +public: + using extents_type = _Extents; + using index_type = typename extents_type::index_type; + using size_type = typename extents_type::size_type; + using rank_type = typename extents_type::rank_type; + using layout_type = layout_stride; + + static_assert(__mdspan::__representable_size, + "The size of extents_type must be representable as index_type"); + + constexpr + mapping() noexcept + { + // The precondition is either statically asserted, or automatically + // satisfied because dynamic extents are zero-initialzied. + size_t __stride = 1; + for (size_t __i = extents_type::rank(); __i > 0; --
Re: [PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic
On Fri, Jun 06, 2025 at 03:52:12PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > @@ -8165,20 +8169,25 @@ > > ;; > > ;; For unpacked vectors, it doesn't really matter whether SEL uses the > > ;; the container size or the element size. If SEL used the container size, > > -;; it would ignore undefined bits of the predicate but would copy the > > -;; upper (undefined) bits of each container along with the defined bits. > > -;; If SEL used the element size, it would use undefined bits of the > > predicate > > -;; to select between undefined elements in each input vector. Thus the > > only > > -;; difference is whether the undefined bits in a container always come from > > -;; the same input as the defined bits, or whether the choice can vary > > +;; it would ignore bits of the predicate that can be undefined, but would > > copy > > +;; the upper (undefined) bits of each container along with the defined > > bits. > > +;; If SEL used the element size, it would use bits from the predicate that > > can > > +;; be undefined to select between undefined elements in each input vector. > > +;; Thus the only difference is whether the undefined bits in a container > > always > > +;; come from the same input as the defined bits, or whether the choice can > > vary > > It looks like the main change here is to replace "undefined bits of the > predicate" with "bits of the predicate that can be undefined". Could > you go into more detail about the distinction? It seems to be saying > that the upper bits in each predicate are sometimes defined and > sometimes not. > > As I see it, the "level of undefinedness" is really the same for the > predicates and data vectors. Within each container element, the bits > that hold the element value are defined/significant and the other bits > are undefined/insignificant/have arbitrary values. The same thing > goes for the upper bits in each predicate element: the low bit is > defined/significant and the other bits are undefined/insignificant/have > arbitrary values. They might by chance be zeros when zeros are > convenient or ones when ones are convenient, but the semantics don't > guarantee anything either way. Yes, I agree. Sorry, my change was not very clear. What I was trying to reflect is that, for example, selecting between a pair of VNx4HF using VNx8BI is now a recognised insn. However, any bits of a VNx8BI that are not significant to a VNx4BI are don't-care wrt the result. I meant that they 'can be undefined' in that they are as good as undefined, for the purpose of SEL. Maybe a better change would have been to reword this paragraph in favour of 'don't-care' rather than 'undefined' when describing the upper bits of each predicate element? > > > ;; independently of the defined bits. > > ;; > > ;; For the other instructions, using the element size is more natural, > > ;; so we do that for SEL as well. > > +;; > > +;; The use of 'aarch64_predicate_operand' here is only to support the FP > > arithmetic/ > > +;; UNSPEC_SEL combiner patterns. As with those operations, any predicate > > bits that > > +;; are insignificant to the data mode will have no effect on the > > operation's result. > > Sorry for the formatting nit, but: long lines. The comment itself looks good > though. > > > +;; > > (define_insn "*vcond_mask_" > >[(set (match_operand:SVE_ALL 0 "register_operand") > > (unspec:SVE_ALL > > - [(match_operand: 3 "register_operand") > > + [(match_operand: 3 "aarch64_predicate_operand") > >(match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm") > >(match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero")] > > UNSPEC_SEL))] > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 287de0f5ae4..d38b108c5f4 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -3893,6 +3893,33 @@ aarch64_sve_fp_pred (machine_mode data_mode, rtx > > *strictness) > > return aarch64_ptrue_reg (aarch64_sve_pred_mode (data_mode)); > > } > > > > +/* If DATA_MODE is a partial vector mode, emit a sequence of insns to > > + zero-out the predicate bits of an existing natural GP, PRED, associated > > + with the undefined elements in each container > > This makes it sound unconditional, whereas it's really conditional on > flag_trapping_math. > > Also, I'd see this more as converting a container predicate to an > element predicate. So maybe: > > /* PRED is a predicate that governs an operation on DATA_MODE. If DATA_MODE >is a partial vector mode, and if exceptions must be suppressed for its >undefined elements, convert PRED from a container-level predicate to >an element-level predicate and ensure that the undefined elements >are inactive. > > But again, please suggest something else if you don't think that's > very clear. Yeah, happy to stick to the container/element size language. Thanks. > > > + > > + Return NULL_RTX if no i
Re: [PATCH] forwprop: Change optimize_agr_copyprop into forward walk instead of backwards
On Sun, Jun 8, 2025 at 7:52 PM Andrew Pinski wrote: > > While thinking about how to implement the rest of the copy prop and makes > sure not > to introduce some compile time problems, optimize_agr_copyprop should be > changed > into a forwproping rather than looking backwards. Can you explain the issues when doing the backwards looking? Note since this is a pure local "propagation" I still wonder whether (now in the forward direction) we want to use single_imm_use () instead of iterating over all uses of the VDEF. Basically we want to keep the optimization as "local" as possible. I also wondered about b = a; c = b; d = b; we'll transform c = b into c = a and that's it. In the end I still hope we can leverage more global analysis, like that of SRA, to decide whether propagation is profitable or not, thus whether we can elide a temporary. In that regard, shouldn't we avoid the propagation if 'DEST' has its address taken given we certainly cannot elide it? Thanks, Richard. > Bootstrapped and tested on x86_64-linux-gnu. > > gcc/ChangeLog: > > * tree-ssa-forwprop.cc (optimize_agr_copyprop): Change into a > forward looking (looking at vdef's uses) instead of a back > looking (vuse's def). > > Signed-off-by: Andrew Pinski > --- > gcc/tree-ssa-forwprop.cc | 121 +++ > 1 file changed, 60 insertions(+), 61 deletions(-) > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > index 43b1c9d696f..2dc77ccba1d 100644 > --- a/gcc/tree-ssa-forwprop.cc > +++ b/gcc/tree-ssa-forwprop.cc > @@ -1344,12 +1344,12 @@ optimize_memcpy_to_memset (gimple_stmt_iterator > *gsip, tree dest, tree src, tree >return true; > } > /* Optimizes > - a = c; > - b = a; > + DEST = SRC; > + DEST2 = DEST; # DEST2 = SRC2; > into > - a = c; > - b = c; > - GSIP is the second statement and SRC is the common > + DEST = SRC; > + DEST2 = SRC; > + GSIP is the first statement and SRC is the common > between the statements. > */ > static bool > @@ -1365,65 +1365,64 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) >if (operand_equal_p (dest, src, 0)) > return false; > > - tree vuse = gimple_vuse (stmt); > - /* If the vuse is the default definition, then there is no store > beforehand. */ > - if (SSA_NAME_IS_DEFAULT_DEF (vuse)) > -return false; > - gimple *defstmt = SSA_NAME_DEF_STMT (vuse); > - if (!gimple_assign_load_p (defstmt) > - || !gimple_store_p (defstmt)) > -return false; > - if (gimple_has_volatile_ops (defstmt)) > -return false; > - > - tree dest2 = gimple_assign_lhs (defstmt); > - tree src2 = gimple_assign_rhs1 (defstmt); > - > - /* If the original store is `src2 = src2;` skip over it. */ > - if (operand_equal_p (src2, dest2, 0)) > -return false; > - if (!operand_equal_p (src, dest2, 0)) > -return false; > - > - > - /* For 2 memory refences and using a temporary to do the copy, > - don't remove the temporary as the 2 memory references might overlap. > - Note t does not need to be decl as it could be field. > - See PR 22237 for full details. > - E.g. > - t = *a; > - *b = t; > - Cannot be convert into > - t = *a; > - *b = *a; > - Though the following is allowed to be done: > - t = *a; > - *a = t; > - And convert it into: > - t = *a; > - *a = *a; > - */ > - if (!operand_equal_p (src2, dest, 0) > - && !DECL_P (dest) && !DECL_P (src2)) > -return false; > - > - if (dump_file && (dump_flags & TDF_DETAILS)) > + tree vdef = gimple_vdef (stmt); > + imm_use_iterator iter; > + gimple *use_stmt; > + bool changed = false; > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, vdef) > { > - fprintf (dump_file, "Simplified\n "); > - print_gimple_stmt (dump_file, stmt, 0, dump_flags); > - fprintf (dump_file, "after previous\n "); > - print_gimple_stmt (dump_file, defstmt, 0, dump_flags); > -} > - gimple_assign_set_rhs_from_tree (gsip, unshare_expr (src2)); > - update_stmt (stmt); > + if (!gimple_assign_load_p (use_stmt) > + || !gimple_store_p (use_stmt)) > + continue; > + if (gimple_has_volatile_ops (use_stmt)) > + continue; > + tree dest2 = gimple_assign_lhs (use_stmt); > + tree src2 = gimple_assign_rhs1 (use_stmt); > + /* If the new store is `src2 = src2;` skip over it. */ > + if (operand_equal_p (src2, dest2, 0)) > + continue; > + if (!operand_equal_p (dest, src2, 0)) > + continue; > + /* For 2 memory refences and using a temporary to do the copy, > +don't remove the temporary as the 2 memory references might overlap. > +Note t does not need to be decl as it could be field. > +See PR 22237 for full details. > +E.g. > +t = *a; #DEST = SRC; > +*b = t; #DEST2 = SRC2; > +Cannot be convert into > +t = *a; > +*b = *a; > +Though the following is allo
RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]
> -Original Message- > From: Richard Biener > Sent: Monday, June 9, 2025 10:30 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > nd > Subject: Re: [PATCH 1/3]middle-end: support vec_cbranch_any and > vec_cbranch_all [PR118974] > > On Mon, 9 Jun 2025, Tamar Christina wrote: > > > This patch introduces two new vector cbranch optabs vec_cbranch_any and > > vec_cbranch_all. > > > > To explain why we need two new optabs let me explain the current cbranch and > its > > limitations and what I'm trying to optimize. So sorry for the long email, > > but I > > hope it explains why I think we want new optabs. > > > > Today cbranch can be used for both vector and scalar modes. In both these > > cases it's intended to compare boolean values, either scalar or vector. > > > > The optab documentation does not however state that it can only handle > > comparisons against 0. So many targets have added code for the vector > > variant > > that tries to deal with the case where we branch based on two non-zero > > registers. > > > > However this code can't ever be reached because the cbranch expansion only > deals > > with comparisons against 0 for vectors. This is because for vectors the > > rest of > > the compiler has no way to generate a non-zero comparison. e.g. the > > vectorizer > > will always generate a zero comparison, and the C/C++ front-ends won't allow > > vectors to be used in a cbranch as it expects a boolean value. ISAs like > > SVE > > work around this by requiring you to use an SVE PTEST intrinsics which > > results > > in a single scalar boolean value that represents the flag values. > > > > e.g. if (svptest_any (..)) > > > > The natural question is why do we not at expand time then rewrite the > comparison > > to a non-zero comparison if the target supports it. > > > > The reason is we can't safely do so. For an ANY comparison (e.g. != b) > > this is > > trivial, but for an ALL comparison (e.g. == b) we would have to flip both > > branch > > and invert the value being compared. i.e. we have to make it a != b > > comparison. > > > > But in emit_cmp_and_jump_insns we can't flip the branches anymore because > they > > have already been lowered into a fall through branch (PC) and a label, > > ready for > > use in an if_then_else RTL expression. > > > > Additionally as mentioned before, cbranches expect the values to be masks, > > not > > values. This kinda works out if you XOR the values, but for FP vectors > > you'd > > need to know what equality means for the FP format. i.e. it's possible for > > IEEE 754 values but It's not immediately obvious if this is true for all > > formats. > > > > Now why does any of this matter? Well there are two optimizations we want > > to > be > > able to do. > > > > 1. Adv. SIMD does not support a vector !=, as in there's no instruction for > > it. > >For both Integer and FP vectors we perform the comparisons as EQ and then > >invert the resulting mask. Ideally we'd like to replace this with just > > a XOR > >and the appropriate branch. > > > > 2. When on an SVE enabled system we would like to use an SVE compare + > branch > >for the Adv. SIMD sequence which could happen due to cost modelling. > However > >we can only do so based on if we know that the values being compared > > against > >are the boolean masks. This means we can't really use combine to do this > >because combine would have to match the entire sequence including the > >vector comparisons because at RTL we've lost the information that > >VECTOR_BOOLEAN_P would have given us. This sequence would be too long > for > >combine to match due to it having to match the compare + branch sequence > >being generated as well. It also becomes a bit messy to match ANY and > > ALL > >sequences. > > I guess I didn't really understand the above, esp. why all of this is > not recoverable on RTL, so I'll state what I think > cbranch is supposed to do and then ask some question on the proposal. > Quite simply the log chain is too long. SVE has flag setting comparisons, so in a single instruction we can perform the final comparison. (which doesn't need to be restricted to EQ or NE) and then using the appropriate condition code decide if we have an ANY or ALL comparison. So the sequence to match is very long. That is to say, in gimple we have a = b > c if (a != 0) ... and SVE doesn't need the indirection. Conceptually this makes it if (any (b > c)) ... For SVE the cbranch expands to a predicate test, which also sets the flag again. this duplication in flag setting is later then removed by various patterns in the backend. This is all fine, but where it gets difficult is trying to replace the Adv. SIMD Compare + branch with an SVE one. To do this it would have to match the entirety of a = b > c if (a != 0) which in RTL is (insn 66 65 67 4 (set (reg:V16QI 185 [ mask_patt_9.20_138 ]) (neg:V16QI (g
Re: [PATCH v2] libstdc++: hashing support for chrono value classes (P2592R2)
On Fri, Jun 6, 2025 at 9:07 AM Giuseppe D'Angelo wrote: > Hi Tomasz, > > Thank you for reviewing the original patch! > > I'm attaching a second version, hopefully addressing what you've > highlighed. I've also pushed it on Forge: > > https://forge.sourceware.org/gcc/gcc-TEST/pulls/52 Posted review there, but major change is, for type that are wrappers over integer: * use hash to hash a value, instead of jus returning it * specialize __is_fash_hash as true for them. > > > > On 24/04/2025 15:30, Tomasz Kaminski wrote: > > Hi, > > > > I am reattaching the original patch below, as I wasn't on the mailing > > list when it was sent. > > Thank you for submitting the patch and apologies for the late response. > > > > The major comment I have is that these are new C++26 classes, so we can > > use requires __is_hash_enabled_for<_Tp> and define only enabled > > specialization. In other cases we will be using a primary template that > > is disabled. > > > > Also, argument_type and result_type typedefs are no longer present since > > C++20, so you can remove them, and also remove __hash_base base classes > > that were responsible for providing them. > > Sounds good to me, I've done this cleanup. > > > + > > > +template > > > + static size_t __hash(const _Arg& __arg, const _Args&... __args) > > > + { > > > + const size_t __arg_hash = hash<_Arg>{}(__arg); > > > + size_t __result = _Hash_impl::hash(__arg_hash); > > ## COMMENT > > I think you can implement this using fold expression, as: > > (_Hash_impl::__hash_combine(hash<_Arg>{}(__args), __result), ...); > > Ok, I just wanted to make it work on C++11, but there's no real need for > that I guess... > > > > + __hash_combine(__result, __args...); > > > + return __result; > > > + } > > > + }; > > > +#endif // C++11 > > > + > > Thank you, > > -- > Giuseppe D'Angelo >
[COMMITTED 20/40] ada: Pragma Ada_XX not propagated from library level spec to body
From: Javier Miranda Add documentation to pragmas Ada_83, Ada_95, Ada_05, Ada_12, and Ada_2022: when placed before a library level package specification they are not propagated to the corresponding package body; they must be added explicitly to the package body. gcc/ada/ChangeLog: * doc/gnat_rm/implementation_defined_pragmas.rst: Adding documentation. * doc/gnat_ugn/the_gnat_compilation_model.rst: ditto. * gnat_rm.texi: Regenerate. * gnat_ugn.texi: Regenerate. Tested on x86_64-pc-linux-gnu, committed on master. --- .../implementation_defined_pragmas.rst| 25 +++ .../gnat_ugn/the_gnat_compilation_model.rst | 4 +++ gcc/ada/gnat_rm.texi | 25 +++ gcc/ada/gnat_ugn.texi | 6 - 4 files changed, 59 insertions(+), 1 deletion(-) diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst index cae8c168562..02013f1d9b1 100644 --- a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst +++ b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst @@ -123,6 +123,11 @@ and generics may name types with unknown discriminants without using the ``(<>)`` notation. In addition, some but not all of the additional restrictions of Ada 83 are enforced. +Like all configuration pragmas, if the pragma is placed before a library +level package specification it is not propagated to the corresponding +package body (see RM 10.1.5(8)); it must be added explicitly to the +package body. + Ada 83 mode is intended for two purposes. Firstly, it allows existing Ada 83 code to be compiled and adapted to GNAT with less effort. Secondly, it aids in keeping code backwards compatible with Ada 83. @@ -149,6 +154,11 @@ contexts. This pragma is useful when writing a reusable component that itself uses Ada 95 features, but which is intended to be usable from either Ada 83 or Ada 95 programs. +Like all configuration pragmas, if the pragma is placed before a library +level package specification it is not propagated to the corresponding +package body (see RM 10.1.5(8)); it must be added explicitly to the +package body. + Pragma Ada_05 = @@ -166,6 +176,11 @@ This pragma is useful when writing a reusable component that itself uses Ada 2005 features, but which is intended to be usable from either Ada 83 or Ada 95 programs. +Like all configuration pragmas, if the pragma is placed before a library +level package specification it is not propagated to the corresponding +package body (see RM 10.1.5(8)); it must be added explicitly to the +package body. + The one argument form (which is not a configuration pragma) is used for managing the transition from Ada 95 to Ada 2005 in the run-time library. If an entity is marked @@ -209,6 +224,11 @@ contexts. This pragma is useful when writing a reusable component that itself uses Ada 2012 features, but which is intended to be usable from Ada 83, Ada 95, or Ada 2005 programs. +Like all configuration pragmas, if the pragma is placed before a library +level package specification it is not propagated to the corresponding +package body (see RM 10.1.5(8)); it must be added explicitly to the +package body. + The one argument form, which is not a configuration pragma, is used for managing the transition from Ada 2005 to Ada 2012 in the run-time library. If an entity is marked @@ -252,6 +272,11 @@ contexts. This pragma is useful when writing a reusable component that itself uses Ada 2022 features, but which is intended to be usable from Ada 83, Ada 95, Ada 2005 or Ada 2012 programs. +Like all configuration pragmas, if the pragma is placed before a library +level package specification it is not propagated to the corresponding +package body (see RM 10.1.5(8)); it must be added explicitly to the +package body. + The one argument form, which is not a configuration pragma, is used for managing the transition from Ada 2012 to Ada 2022 in the run-time library. If an entity is marked diff --git a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst index 64a363132c7..891886b5360 100644 --- a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst +++ b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst @@ -1477,6 +1477,10 @@ You can place configuration pragmas either appear at the start of a compilation unit or in a configuration pragma file that applies to all compilations performed in a given compilation environment. +Configuration pragmas placed before a library level package specification +are not propagated to the corresponding package body (see RM 10.1.5(8)); +they must be added explicitly to the package body. + GNAT includes the ``gnatchop`` utility to provide an automatic way to handle configuration pragmas that follows the semantics for compilations (that is, files with multiple units) described in the RM.
[COMMITTED 36/40] ada: Add null exclusion formal to Process_Subtype
From: Ronan Desplanques Before this patch, Process_Subtype looked at the parent of its argument to determine whether it was called in a context that excluded null. This patch replaces this lookup with a new formal parameter to Process_Subtype, and updates the calls to it accordingly. gcc/ada/ChangeLog: * sem_ch3.ads (Process_Subtype): Add formal. * sem_ch3.adb (Process_Subtype): Use new formal. (Analyze_Subtype_Declaration, Array_Type_Declaration, Build_Derived_Access_Type): Pass new actual. * sem_ch4.adb (Find_Type_Of_Object): Likewise. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch3.adb | 78 ++--- gcc/ada/sem_ch3.ads | 9 +++--- gcc/ada/sem_ch4.adb | 3 +- 3 files changed, 38 insertions(+), 52 deletions(-) diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb index 7cec589731f..6c2d0326c3f 100644 --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -5769,7 +5769,13 @@ package body Sem_Ch3 is Enter_Name (Id); end if; - T := Process_Subtype (Subtype_Indication (N), N, Id, 'P'); + T := +Process_Subtype + (Subtype_Indication (N), + N, + Id, + 'P', + Excludes_Null => Null_Exclusion_Present (N)); -- Class-wide equivalent types of records with unknown discriminants -- involve the generation of an itype which serves as the private view @@ -6586,7 +6592,13 @@ package body Sem_Ch3 is -- Process subtype indication if one is present if Present (Component_Typ) then - Element_Type := Process_Subtype (Component_Typ, P, Related_Id, 'C'); + Element_Type := + Process_Subtype + (Component_Typ, + P, + Related_Id, + 'C', + Excludes_Null => Null_Exclusion_Present (Component_Def)); Set_Etype (Component_Typ, Element_Type); -- Ada 2005 (AI-230): Access Definition case @@ -7202,7 +7214,11 @@ package body Sem_Ch3 is Set_Directly_Designated_Type (Derived_Type, Designated_Type (Parent_Type)); - Subt := Process_Subtype (S, N); + Subt := +Process_Subtype + (S, + N, + Excludes_Null => Null_Exclusion_Present (Type_Definition (N))); if Nkind (S) /= N_Subtype_Indication and then Subt /= Base_Type (Subt) @@ -18826,7 +18842,11 @@ package body Sem_Ch3 is -- Otherwise, the object definition is just a subtype_mark else - T := Process_Subtype (Obj_Def, Related_Nod); + T := + Process_Subtype + (Obj_Def, + Related_Nod, + Excludes_Null => Null_Exclusion_Present (Parent (Obj_Def))); end if; return T; @@ -22501,10 +22521,11 @@ package body Sem_Ch3 is - function Process_Subtype - (S : Node_Id; - Related_Nod : Node_Id; - Related_Id : Entity_Id := Empty; - Suffix : Character := ' ') return Entity_Id + (S : Node_Id; + Related_Nod : Node_Id; + Related_Id: Entity_Id := Empty; + Suffix: Character := ' '; + Excludes_Null : Boolean := False) return Entity_Id is procedure Check_Incomplete (T : Node_Id); -- Called to verify that an incomplete type is not used prematurely @@ -22538,8 +22559,6 @@ package body Sem_Ch3 is Full_View_Id: Entity_Id; Subtype_Mark_Id : Entity_Id; - May_Have_Null_Exclusion : Boolean; - -- Start of processing for Process_Subtype begin @@ -22560,33 +22579,10 @@ package body Sem_Ch3 is Check_Incomplete (S); P := Parent (S); - -- The following mirroring of assertion in Null_Exclusion_Present is - -- ugly, can't we have a range, a static predicate or even a flag??? - - May_Have_Null_Exclusion := - Present (P) - and then - Nkind (P) in N_Access_Definition - | N_Access_Function_Definition - | N_Access_Procedure_Definition - | N_Access_To_Object_Definition - | N_Allocator - | N_Component_Definition - | N_Derived_Type_Definition - | N_Discriminant_Specification - | N_Formal_Object_Declaration - | N_Function_Specification - | N_Object_Declaration - | N_Object_Renaming_Declaration - | N_Parameter_Specification - | N_Subtype_Declaration; - -- Ada 2005 (AI-231): Static check if Ada_Version >= Ada_2005 - and then May_Have_Null_Exclusion - and then Null_Exclusion_Present (P) - and then Nkind (P) /= N_Access_To_Object_Definition + and then Excludes_Null
[COMMITTED 34/40] ada: Missing discriminant check on assignment of Bounded_Vector aggregate
From: Gary Dismukes When a container aggregate for a Bounded_Vector type involves an iterated association that is assigned to a vector object whose capacity (as defined by the Capacity discriminant) is less than the number of elements of the aggregate, Constraint_Error should be raised due to failing a discriminant check on the assignment. But the compiler fails to do proper expansion, plus omits the check, and instead creates a temporary whose capacity is bounded by that of the target vector of the assignment. It attempts to assign all elements of the aggregate to the temporary, resulting in a failure on a call to the Replace_Element operation that assigns past the length of the temporary vector (which can result in a Storage_Error due to a segment violation). This is fixed by ensuring that the temporary object is declared with an unconstrained base subtype rather than the assignment target's constrained subtype. gcc/ada/ChangeLog: * exp_aggr.adb (Expand_Container_Aggregate): Use the Base_Type of the subtype provided by the context as the subtype of the temporary object initialized by the aggregate. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_aggr.adb | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb index 5450402f474..8db15fa6a11 100644 --- a/gcc/ada/exp_aggr.adb +++ b/gcc/ada/exp_aggr.adb @@ -7503,10 +7503,19 @@ package body Exp_Aggr is Set_Assignment_OK (Lhs); Aggr_Code := Build_Container_Aggr_Code (N, Typ, Lhs, Init); + + -- Use the unconstrained base subtype of the subtype provided by + -- the context for declaring the temporary object (which may come + -- from a constrained assignment target), to ensure that the + -- aggregate can be successfully expanded and assigned to the + -- temporary without exceeding its capacity. (Later assignment + -- of the temporary to a target object may result in failing + -- a discriminant check.) + Prepend_To (Aggr_Code, Make_Object_Declaration (Loc, Defining_Identifier => Obj_Id, - Object_Definition => New_Occurrence_Of (Typ, Loc), + Object_Definition => New_Occurrence_Of (Base_Type (Typ), Loc), Expression => Init)); Insert_Actions (N, Aggr_Code); -- 2.43.0
[COMMITTED 37/40] ada: Clarify code in Process_Subtype
From: Ronan Desplanques This patch factorizes two if statements together in the body of Process_Subtype, to improve readability. gcc/ada/ChangeLog: * sem_ch3.adb (Process_Subtype): Clarify code. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch3.adb | 90 ++--- 1 file changed, 43 insertions(+), 47 deletions(-) diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb index 6c2d0326c3f..9d93bf79c0c 100644 --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -22579,63 +22579,59 @@ package body Sem_Ch3 is Check_Incomplete (S); P := Parent (S); - -- Ada 2005 (AI-231): Static check + if Excludes_Null then +-- Create an Itype that is a duplicate of Entity (S) but with the +-- null-exclusion attribute. +if Is_Access_Type (Entity (S)) then + if Can_Never_Be_Null (Entity (S)) then + case Nkind (Related_Nod) is + when N_Full_Type_Declaration => +if Nkind (Type_Definition (Related_Nod)) + in N_Array_Type_Definition +then + Error_Node := + Subtype_Indication + (Component_Definition + (Type_Definition (Related_Nod))); +else + Error_Node := + Subtype_Indication + (Type_Definition (Related_Nod)); +end if; - if Ada_Version >= Ada_2005 - and then Excludes_Null - and then not Is_Access_Type (Entity (S)) - then -Error_Msg_N ("`NOT NULL` only allowed for an access type", S); - end if; + when N_Subtype_Declaration => +Error_Node := Subtype_Indication (Related_Nod); - -- Create an Itype that is a duplicate of Entity (S) but with the - -- null-exclusion attribute. + when N_Object_Declaration => +Error_Node := Object_Definition (Related_Nod); - if Is_Access_Type (Entity (S)) and then Excludes_Null then -if Can_Never_Be_Null (Entity (S)) then - case Nkind (Related_Nod) is - when N_Full_Type_Declaration => - if Nkind (Type_Definition (Related_Nod)) - in N_Array_Type_Definition - then + when N_Component_Declaration => Error_Node := Subtype_Indication -(Component_Definition - (Type_Definition (Related_Nod))); - else -Error_Node := - Subtype_Indication (Type_Definition (Related_Nod)); - end if; +(Component_Definition (Related_Nod)); - when N_Subtype_Declaration => - Error_Node := Subtype_Indication (Related_Nod); + when N_Allocator => +Error_Node := Expression (Related_Nod); - when N_Object_Declaration => - Error_Node := Object_Definition (Related_Nod); + when others => +pragma Assert (False); +Error_Node := Related_Nod; + end case; - when N_Component_Declaration => - Error_Node := - Subtype_Indication (Component_Definition (Related_Nod)); + Error_Msg_NE +("`NOT NULL` not allowed (& already excludes null)", + Error_Node, + Entity (S)); + end if; - when N_Allocator => - Error_Node := Expression (Related_Nod); - - when others => - pragma Assert (False); - Error_Node := Related_Nod; - end case; - - Error_Msg_NE - ("`NOT NULL` not allowed (& already excludes null)", - Error_Node, - Entity (S)); + Set_Etype + (S, + Create_Null_Excluding_Itype +(T => Entity (S), Related_Nod => P)); + Set_Entity (S, Etype (S)); +elsif Ada_Version >= Ada_2005 then + Error_Msg_N ("`NOT NULL` only allowed for an access type", S); end if; - -Set_Etype (S, - Create_Null_Excluding_Itype -(T => Entity (S), - Related_Nod => P)); -Set_Entity (S, Etype (S)); end if; return Entity (S); -- 2.43.0
[COMMITTED 40/40] ada: Support fixed-lower-bound array types as generic actual parameters
From: Gary Dismukes Attempting to use a fixed-lower-bound array type (or subtype) as an actual parameter for a formal unconstrained array type was being rejected by the compiler (complaining about the index type of the actual not matching the index type of the formal type). The compiler was improperly testing the actual's FLB range and finding that it didn't statically match the index type of the formal array type; it should instead test the underlying index type of the FLB type or subtype. gcc/ada/ChangeLog: * sem_ch3.adb (Constrain_Index): In the case of a fixed-lower-bound index, set Etype of the newly created itype's Scalar_Range from the index's Etype. * sem_ch12.adb (Validate_Array_Type_Instance): If the actual subtype is a fixed-lower-bound type, then check again the Etype of its Scalar_Range. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch12.adb | 10 ++ gcc/ada/sem_ch3.adb | 4 +++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb index 3a31a921fe3..c9b9e7f87ec 100644 --- a/gcc/ada/sem_ch12.adb +++ b/gcc/ada/sem_ch12.adb @@ -14129,6 +14129,16 @@ package body Sem_Ch12 is T2 := Etype (I2); end if; +-- In the case of a fixed-lower-bound subtype, we want to check +-- against the index type's range rather than the range of the +-- subtype (which will be seen as unconstrained, and whose bounds +-- won't generally match those of the formal unconstrained array +-- type's corresponding index type). + +if Is_Fixed_Lower_Bound_Index_Subtype (T2) then + T2 := Etype (Scalar_Range (T2)); +end if; + if not Subtypes_Match (Find_Actual_Type (Etype (I1), A_Gen_T), T2) then diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb index 9d93bf79c0c..1263d7004d9 100644 --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -15099,7 +15099,8 @@ package body Sem_Ch3 is -- If this is a range for a fixed-lower-bound subtype, then set the -- index itype's low bound to the FLB and the index itype's upper bound -- to the high bound of the parent array type's index subtype. Also, - -- mark the itype as an FLB index subtype. + -- set the Etype of the new scalar range and mark the itype as an FLB + -- index subtype. if Nkind (S) = N_Range and then Is_FLB_Index then Set_Scalar_Range @@ -15107,6 +15108,7 @@ package body Sem_Ch3 is Make_Range (Sloc (S), Low_Bound => Low_Bound (S), High_Bound => Type_High_Bound (T))); + Set_Etype (Scalar_Range (Def_Id), Etype (Index)); Set_Is_Fixed_Lower_Bound_Index_Subtype (Def_Id); else -- 2.43.0
[COMMITTED 39/40] ada: Reject component-related aspects on formal non-array types
From: Piotr Trojanek In Ada 2022 aspects Atomic_Components and Volatile_Components can be specified for a formal array type, but they were wrongly accepted on any formal type. Also, we don't need to check if the corresponding pragmas appear in Ada 2022 mode, because generic formal parameters can't have explicit representation pragmas in any Ada version and can only have aspects since Ada 2022. gcc/ada/ChangeLog: * sem_prag.adb (Analyze_Pragma): Fix conditions for legality checks on formal type declarations. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_prag.adb | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb index 01983f9d89f..688ccc7c007 100644 --- a/gcc/ada/sem_prag.adb +++ b/gcc/ada/sem_prag.adb @@ -14677,21 +14677,18 @@ package body Sem_Prag is D := Declaration_Node (E); -if (Nkind (D) = N_Full_Type_Declaration and then Is_Array_Type (E)) +if (Nkind (D) in N_Full_Type_Declaration + | N_Formal_Type_Declaration + and then Is_Array_Type (E)) or else (Nkind (D) = N_Object_Declaration and then Ekind (E) in E_Constant | E_Variable and then Nkind (Object_Definition (D)) = N_Constrained_Array_Definition) - or else - (Ada_Version >= Ada_2022 - and then Nkind (D) = N_Formal_Type_Declaration) then -- The flag is set on the base type, or on the object - if Nkind (D) in N_Full_Type_Declaration - | N_Formal_Type_Declaration - then + if Is_Array_Type (E) then E := Base_Type (E); end if; -- 2.43.0
[COMMITTED 31/40] ada: Fix comment
From: Ronan Desplanques gcc/ada/ChangeLog: * sem.adb (Analyze): Fix comment. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem.adb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/ada/sem.adb b/gcc/ada/sem.adb index 06df00ec871..f5ce9f2300e 100644 --- a/gcc/ada/sem.adb +++ b/gcc/ada/sem.adb @@ -770,7 +770,7 @@ package body Sem is and then Present (Scope (E)) and then Ekind (Scope (E)) = E_Record_Type then - null; -- Set it later, in Analyze_Component_Declaration + null; -- Set it later, in Record_Type_Definition elsif not Is_Not_Self_Hidden (E) then Set_Is_Not_Self_Hidden (E); end if; -- 2.43.0
[COMMITTED 38/40] ada: Fix glitch in handling of Atomic_Components on generic formal type
From: Piotr Trojanek In Ada 2022 aspects Atomic_Components and Volatile_Components can be specified for a formal array type, but then they need to be set on the base type entity. Otherwise we get an assertion failure in debug build and wrong legality errors in production builds. gcc/ada/ChangeLog: * sem_prag.adb (Analyze_Pragma): If pragmas apply to a formal array type, then set the flags on the base type. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_prag.adb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb index 4090d0c7117..01983f9d89f 100644 --- a/gcc/ada/sem_prag.adb +++ b/gcc/ada/sem_prag.adb @@ -14689,7 +14689,9 @@ package body Sem_Prag is then -- The flag is set on the base type, or on the object - if Nkind (D) = N_Full_Type_Declaration then + if Nkind (D) in N_Full_Type_Declaration + | N_Formal_Type_Declaration + then E := Base_Type (E); end if; -- 2.43.0
Re: [PATCH] i386: Handle ZERO_EXTEND like SIGN_EXTEND in bsr patterns [PR120434]
On Fri, Jun 6, 2025 at 3:43 PM Jakub Jelinek wrote: > > Hi! > > The just posted second PR120434 patch causes > +FAIL: gcc.target/i386/pr78103-3.c scan-assembler m(leaq|addq|incq)M > +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not mmovlM+ > +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not msubqM > +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not mxor[lq]M > While the patch generally improves code generation by often using > ZERO_EXTEND instead of SIGN_EXTEND, where the former is often for free > on x86_64 while the latter requires an extra instruction or larger > instruction than one with just zero extend, the PR78103 combine patterns > and splitters were written only with SIGN_EXTEND in mind. As CLZ is UB > on 0 and otherwise returns just [0,63] and is xored with 63, ZERO_EXTEND > does the same thing there as SIGN_EXTEND. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2025-06-06 Jakub Jelinek > > PR middle-end/120434 > * config/i386/i386.md (*bsr_rex64_2): Rename to ... > (*bsr_rex64_2): ... this. Use any_extend instead of sign_extend. > (*bsr_2): Rename to ... > (*bsr_2): ... this. Use any_extend instead of sign_extend. > (bsr splitters after those): Use any_extend instead of sign_extend. OK. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2025-06-04 19:37:23.798424056 +0200 > +++ gcc/config/i386/i386.md 2025-06-06 10:56:04.006224677 +0200 > @@ -21512,11 +21512,12 @@ > (set_attr "mode" "SI")]) > > ; As bsr is undefined behavior on zero and for other input > -; values it is in range 0 to 63, we can optimize away sign-extends. > -(define_insn_and_split "*bsr_rex64_2" > +; values it is in range 0 to 63, we can optimize away sign-extends > +; or zero-extends. > +(define_insn_and_split "*bsr_rex64_2" >[(set (match_operand:DI 0 "register_operand") > (xor:DI > - (sign_extend:DI > + (any_extend:DI > (minus:SI > (const_int 63) > (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand")) > @@ -21538,9 +21539,9 @@ >operands[3] = lowpart_subreg (SImode, operands[2], DImode); > }) > > -(define_insn_and_split "*bsr_2" > +(define_insn_and_split "*bsr_2" >[(set (match_operand:DI 0 "register_operand") > - (sign_extend:DI > + (any_extend:DI > (xor:SI > (minus:SI > (const_int 31) > @@ -21617,7 +21618,7 @@ > (minus:DI > (match_operand:DI 2 "const_int_operand") > (xor:DI > - (sign_extend:DI > + (any_extend:DI > (minus:SI (const_int 63) > (subreg:SI > (clz:DI (match_operand:DI 1 "nonimmediate_operand")) > @@ -21647,7 +21648,7 @@ >[(set (match_operand:DI 0 "register_operand") > (minus:DI > (match_operand:DI 2 "const_int_operand") > - (sign_extend:DI > + (any_extend:DI > (xor:SI > (minus:SI (const_int 31) > (clz:SI (match_operand:SI 1 "nonimmediate_operand"))) > > > Jakub >
Re: [PATCH] [RFC] RISC-V: Add extra check to help choosing multilib with equivalent arch.
On Wed, May 28, 2025 at 8:03 PM yunzezhu wrote: > > > I thought this issue should be fixed when we implement those > > implication rules correctly? Does march=rv32imaf_zca/mabi=ilp32 still > > not able select march=rv32imac/mabi=ilp32 still happen after this[1] > > patch? > > > > [1] > > https://github.com/gcc-mirror/gcc/commit/42ce61eaefc4db70e2e7ea2d8ef091daa458eb48 > > Yes, march=rv32imaf_zca/mabi=ilp32 still not able to select > march=rv32imac/mabi=ilp32. > In my opinion that in order to imply C from zca and f ext, the arch must > contain zcf, because C+F is equivalent to F+Zca+Zcf and vice versa. The arch > rv32imaf_zca contains F and zca but no zcf so we cannot imply C and therefore > multilb rv32imac/mabi=ilp32 cannot be selected. But rv32imaf_zca means we don't have zcf, so it can not select rv32imac, otherwise that means the final binary will have zcf instruction and that won't be able to run on env that only have rv32imaf_zca.
[COMMITTED 18/40] ada: Remove unnecessary special handling
From: Ronan Desplanques This patch removes a special exemption in Enter_Name. That exemption was preceded by a comment which described what situations it was supposed to be required for, but it was unnecessary even in those situations. gcc/ada/ChangeLog: * sem_util.adb (Enter_Name): Remove special handling. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_util.adb | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index 523aff33f95..59bf060ee74 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -8082,17 +8082,7 @@ package body Sem_Util is -- If we fall through, declaration is OK, at least OK enough to continue - -- If Def_Id is a discriminant or a record component we are in the midst - -- of inheriting components in a derived record definition. Preserve - -- their Ekind and Etype. - - if Ekind (Def_Id) in E_Discriminant | E_Component then - null; - - elsif Present (Etype (Def_Id)) then - null; - - else + if No (Etype (Def_Id)) then Set_Etype (Def_Id, Any_Type); -- avoid cascaded errors end if; -- 2.43.0
[COMMITTED 09/40] ada: Clarify warning in Atree.Rewrite documentation
From: Ronan Desplanques The documentation of Atree.Rewrite warns about a potential misuse of that subprogram. This patch makes the text of that warning more specific. The documentation of Atree.Replace had the same note but this patch replaces it with a mention of the one in Rewrite's documentation. gcc/ada/ChangeLog: * atree.ads (Rewrite, Replace): Clarify comments. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/atree.ads | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/gcc/ada/atree.ads b/gcc/ada/atree.ads index 14261692142..760c63b9bea 100644 --- a/gcc/ada/atree.ads +++ b/gcc/ada/atree.ads @@ -539,9 +539,10 @@ package Atree is -- newly constructed replacement subtree. -- -- Note: New_Node may not contain references to Old_Node, for example as - -- descendants, since the rewrite would make such references invalid. If - -- New_Node does need to reference Old_Node, then these references should - -- be to a relocated copy of Old_Node (see Relocate_Node procedure). + -- descendants, since the rewrite would turn them into cyclic + -- self-references. If New_Node does need to reference Old_Node, then these + -- references should be to a relocated copy of Old_Node (see Relocate_Node + -- procedure). -- -- Note: The Original_Node function applied to Old_Node (which has now -- been replaced by the contents of New_Node), can be used to obtain the @@ -555,10 +556,8 @@ package Atree is -- original contents of the Old_Node, but rather the New_Node value. -- Replace also preserves the setting of Comes_From_Source. -- - -- Note that New_Node must not contain references to Old_Node, for example - -- as descendants, since the rewrite would make such references invalid. If - -- New_Node does need to reference Old_Node, then these references should - -- be to a relocated copy of Old_Node (see Relocate_Node procedure). + -- The note in the documentation of Rewrite about the risk of creating + -- cyclic references also applies here. -- -- Replace is used in certain circumstances where it is desirable to -- suppress any history of the rewriting operation. Notably, it is used -- 2.43.0
[COMMITTED 35/40] ada: Call Mutate_Ekind earlier for formal entities
From: Ronan Desplanques This patch migrates the handling of "premature usage" type of error to the Is_Self_Hidden mechanism. gcc/ada/ChangeLog: * sem_ch6.adb (Set_Formal_Mode): Extend profile. Move parts of the body… (Process_Formals): … here. Move call to Set_Formal_Mode earlier. Call Set_Is_Not_Self_Hidden in second traversal. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch6.adb | 107 +--- 1 file changed, 52 insertions(+), 55 deletions(-) diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb index 913217102a7..a142a1c2f62 100644 --- a/gcc/ada/sem_ch6.adb +++ b/gcc/ada/sem_ch6.adb @@ -225,7 +225,10 @@ package body Sem_Ch6 is -- Create the declaration for an inequality operator that is implicitly -- created by a user-defined equality operator that yields a boolean. - procedure Set_Formal_Mode (Formal_Id : Entity_Id); + procedure Set_Formal_Mode + (Formal_Id : Entity_Id; + Spec : N_Parameter_Specification_Id; + Subp_Id : Entity_Id); -- Set proper Ekind to reflect formal mode (in, out, in out), and set -- miscellaneous other attributes. @@ -13066,13 +13069,10 @@ package body Sem_Ch6 is -- Start of processing for Process_Formals begin - -- In order to prevent premature use of the formals in the same formal - -- part, the Ekind is left undefined until all default expressions are - -- analyzed. The Ekind is established in a separate loop at the end. - Param_Spec := First (T); while Present (Param_Spec) loop Formal := Defining_Identifier (Param_Spec); + Set_Formal_Mode (Formal, Param_Spec, Current_Scope); Set_Never_Set_In_Source (Formal, True); Enter_Name (Formal); @@ -13390,12 +13390,48 @@ package body Sem_Ch6 is Analyze_Return_Type (Related_Nod); end if; - -- Now set the kind (mode) of each formal - Param_Spec := First (T); while Present (Param_Spec) loop Formal := Defining_Identifier (Param_Spec); - Set_Formal_Mode (Formal); + Set_Is_Not_Self_Hidden (Formal); + + -- Set Is_Known_Non_Null for access parameters since the language + -- guarantees that access parameters are always non-null. We also set + -- Can_Never_Be_Null, since there is no way to change the value. + + if Nkind (Parameter_Type (Param_Spec)) = N_Access_Definition then + +-- Ada 2005 (AI-231): In Ada 95, access parameters are always non- +-- null; In Ada 2005, only if then null_exclusion is explicit. + +if Ada_Version < Ada_2005 + or else Can_Never_Be_Null (Etype (Formal)) +then + Set_Is_Known_Non_Null (Formal); + Set_Can_Never_Be_Null (Formal); +end if; + + -- Ada 2005 (AI-231): Null-exclusion access subtype + + elsif Is_Access_Type (Etype (Formal)) + and then Can_Never_Be_Null (Etype (Formal)) + then +Set_Is_Known_Non_Null (Formal); + +-- We can also set Can_Never_Be_Null (thus preventing some junk +-- access checks) for the case of an IN parameter, which cannot +-- be changed, or for an IN OUT parameter, which can be changed +-- but not to a null value. But for an OUT parameter, the initial +-- value passed in can be null, so we can't set this flag in that +-- case. + +if Ekind (Formal) /= E_Out_Parameter then + Set_Can_Never_Be_Null (Formal); +end if; + end if; + + Set_Mechanism (Formal, Default_Mechanism); + Set_Formal_Validity (Formal); if Ekind (Formal) = E_In_Parameter then Default := Expression (Param_Spec); @@ -13666,23 +13702,23 @@ package body Sem_Ch6 is -- Set_Formal_Mode -- - - procedure Set_Formal_Mode (Formal_Id : Entity_Id) is - Spec : constant Node_Id := Parent (Formal_Id); - Id : constant Entity_Id := Scope (Formal_Id); - + procedure Set_Formal_Mode + (Formal_Id : Entity_Id; + Spec : N_Parameter_Specification_Id; + Subp_Id : Entity_Id) is begin -- Note: we set Is_Known_Valid for IN parameters and IN OUT parameters -- since we ensure that corresponding actuals are always valid at the -- point of the call. if Out_Present (Spec) then - if Is_Entry (Id) - or else Is_Subprogram_Or_Generic_Subprogram (Id) + if Is_Entry (Subp_Id) + or else Is_Subprogram_Or_Generic_Subprogram (Subp_Id) then -Set_Has_Out_Or_In_Out_Parameter (Id, True); +Set_Has_Out_Or_In_Out_Parameter (Subp_Id, True); end if; - if Ekind (Id) in E_Function | E_Generic_Function then + if Ekind (Subp_Id) in E_Function | E_Generic_Fu
[COMMITTED 29/40] ada: Fix SPARK test failures caused by new handling of inherited class-wide pre/post
From: Gary Dismukes The revised handling of inherited class-wide pre/postconditions (for properly implementing the rules of RM 6.1.1(7/5)) broke two SPARK tests (N709-001__contracts and V516-041__private_ownership). This change fixes that, by refining the test for detecting formal parameters used as actuals in calls to primitive functions, as well as adding handling for 'Result attributes given as actuals in such calls. gcc/ada/ChangeLog: * exp_util.adb (Call_To_Parent_Dispatching_Op_Must_Be_Mapped): Replace test of Covers with test of Is_Controlling_Formal. Add handling for 'Result actuals. Remove Actual_Type and its uses. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_util.adb | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb index b76d387c5a5..8ac1b9001a4 100644 --- a/gcc/ada/exp_util.adb +++ b/gcc/ada/exp_util.adb @@ -1552,7 +1552,6 @@ package body Exp_Util is pragma Assert (Nkind (Call_Node) = N_Function_Call); Actual : Node_Id := First_Actual (Call_Node); - Actual_Type : Entity_Id; Actual_Or_Prefix : Node_Id; begin @@ -1579,11 +1578,11 @@ package body Exp_Util is Actual_Or_Prefix := Actual; end if; - Actual_Type := Etype (Actual); - - if Is_Anonymous_Access_Type (Actual_Type) then - Actual_Type := Designated_Type (Actual_Type); - end if; + -- If at least one actual is a controlling formal + -- parameter of a class-wide Pre/Post aspect's + -- subprogram, the rule in RM 6.1.1(7) applies, + -- and we want to map the call to target the + -- corresponding function of the derived type. if Nkind (Actual_Or_Prefix) in N_Identifier @@ -1592,11 +1591,17 @@ package body Exp_Util is and then Is_Formal (Entity (Actual_Or_Prefix)) - and then Covers (Ctrl_Type, Actual_Type) + and then Is_Controlling_Formal +(Entity (Actual_Or_Prefix)) then - -- At least one actual is a formal parameter of - -- Par_Subp with type Ctrl_Type. + return True; + -- RM 6.1.1(7) also applies to Result attributes + -- of primitive functions with controlling results. + + elsif Is_Attribute_Result (Actual) + and then Has_Controlling_Result (Subp) + then return True; end if; -- 2.43.0
[COMMITTED 22/40] ada: Do not build dispatch tables for generics
From: Ronan Desplanques Before this patch, Build_Static_Dispatch_Tables was called on generic package bodies. While this has not been proved to cause any actual bug, it was clearly inappropriate and also useless, so this patch removes those calls. gcc/ada/ChangeLog: * sem_ch10.adb (Analyze_Compilation_Unit): Check for generic bodies. * exp_disp.adb (Build_Dispatch_Tables): Likewise. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_disp.adb | 4 +++- gcc/ada/sem_ch10.adb | 8 +++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/gcc/ada/exp_disp.adb b/gcc/ada/exp_disp.adb index 458b32c1730..080a2e1a6c1 100644 --- a/gcc/ada/exp_disp.adb +++ b/gcc/ada/exp_disp.adb @@ -413,7 +413,9 @@ package body Exp_Disp is if Nkind (D) = N_Package_Declaration then Build_Package_Dispatch_Tables (D); -elsif Nkind (D) = N_Package_Body then +elsif Nkind (D) = N_Package_Body + and then Ekind (Corresponding_Spec (D)) /= E_Generic_Package +then Build_Dispatch_Tables (Declarations (D)); elsif Nkind (D) = N_Package_Body_Stub diff --git a/gcc/ada/sem_ch10.adb b/gcc/ada/sem_ch10.adb index 25bba9b6075..45aabadf21f 100644 --- a/gcc/ada/sem_ch10.adb +++ b/gcc/ada/sem_ch10.adb @@ -1225,9 +1225,15 @@ package body Sem_Ch10 is if Expander_Active and then Tagged_Type_Expansion then case Nkind (Unit_Node) is -when N_Package_Declaration | N_Package_Body => +when N_Package_Declaration => Build_Static_Dispatch_Tables (Unit_Node); +when N_Package_Body => + if Ekind (Corresponding_Spec (Unit_Node)) /= E_Generic_Package + then + Build_Static_Dispatch_Tables (Unit_Node); + end if; + when N_Package_Instantiation => Build_Static_Dispatch_Tables (Instance_Spec (Unit_Node)); -- 2.43.0
[COMMITTED 21/40] ada: Tune recent change for warning about unsupported overlays
From: Piotr Trojanek Fix crash occurring when overlay applies to protected component and expansion is disabled, e.g. because of semantic checking mode (switch -gnatc) or because the compiler is running in GNATprove mode. Also, simply pick the type of overlaid object from the attribute prefix itself. gcc/ada/ChangeLog: * sem_util.adb (Find_Overlaid_Entity): Don't call Etype on empty Ent; tune style; move computation of Overl_Typ out of the loop. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_util.adb | 30 ++ 1 file changed, 6 insertions(+), 24 deletions(-) diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index 59bf060ee74..c74c10f2b5f 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -8935,9 +8935,9 @@ package body Sem_Util is -- In the second case, the expr is either Y'Address, or recursively a -- constant that eventually references Y'Address. - Ent := Empty; + Ent := Empty; Ovrl_Typ := Empty; - Off := False; + Off := False; Expr := Expression (N); @@ -8967,6 +8967,8 @@ package body Sem_Util is end if; end loop; + Ovrl_Typ := Etype (Expr); + -- This loop checks the form of the prefix for an entity, using -- recursion to deal with intermediate components. @@ -8985,11 +8987,8 @@ package body Sem_Util is pragma Assert (not Expander_Active and then Is_Concurrent_Type (Scope (Ent))); - Ent := Empty; -end if; - -if No (Ovrl_Typ) then - Ovrl_Typ := Etype (Ent); + Ent := Empty; + Ovrl_Typ := Empty; end if; return; @@ -8997,23 +8996,6 @@ package body Sem_Util is -- Check for components elsif Nkind (Expr) in N_Selected_Component | N_Indexed_Component then -if Nkind (Expr) = N_Selected_Component then - -- If Something.Other'Address, use - -- the Etype of the Other component. - - if No (Ovrl_Typ) then - Ovrl_Typ := Etype (Entity (Selector_Name (Expr))); - end if; - -else - -- If Something(Index)'Address, use - -- the Etype of the array component. - - if No (Ovrl_Typ) then - Ovrl_Typ := Etype (Expr); - end if; -end if; - Expr := Prefix (Expr); Off := True; -- 2.43.0
[COMMITTED 16/40] ada: Constraint check on tagged build-in-place object decls
From: Bob Duff In the case of "X : T := F (...);", where T is a constrained discriminated tagged subtype, perform a constraint check after F returns. The result of F is allocated by the callee on the secondary stack in this case. Note that there are still missing checks for some build-in-place calls. gcc/ada/ChangeLog: * exp_ch6.adb: Remove a couple of "???" suggesting something that we will likely never do. (Make_Build_In_Place_Call_In_Object_Declaration): When a constraint check is needed, do the check. Do it at the call site for now. The check is still missing in the untagged case, because the caller allocates in that case. * sem_ch8.adb (Analyze_Object_Renaming): Remove obsolete transformation of a renaming into an object declaration. Given that we also (sometimes) tranform object declarations into renamings, this transformation was adding complexity; the new code in Make_Build_In_Place_Call_In_Object_Declaration above would need to explicitly avoid the run-time check in the case of renamings, because renamings are supposed to ignore the nominal subtype. Anyway, it is no longer needed. * exp_ch3.adb (Expand_N_Object_Declaration): Rewrite comment; it IS clear how to do it, but we haven't done it right yet. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_ch3.adb | 5 +++-- gcc/ada/exp_ch6.adb | 41 +++-- gcc/ada/sem_ch8.adb | 23 --- 3 files changed, 30 insertions(+), 39 deletions(-) diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb index d884e755d66..cf2238e9ee1 100644 --- a/gcc/ada/exp_ch3.adb +++ b/gcc/ada/exp_ch3.adb @@ -8741,8 +8741,9 @@ package body Exp_Ch3 is -- be illegal in some cases (such as converting access- -- to-unconstrained to access-to-constrained), but the -- the unchecked conversion will presumably fail to work - -- right in just such cases. It's not clear at all how to - -- handle this. + -- right in just such cases. In order to handle this + -- properly, in the Caller_Allocation case, the callee + -- needs to do the constraint check. Alloc_Stmt := Make_If_Statement (Loc, diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb index f85d977d0d8..84847377bf3 100644 --- a/gcc/ada/exp_ch6.adb +++ b/gcc/ada/exp_ch6.adb @@ -158,7 +158,7 @@ package body Exp_Ch6 is Alloc_Form_Exp : Node_Id := Empty; Pool_Exp : Node_Id := Empty); -- Ada 2005 (AI-318-02): If the result type of a build-in-place call needs - -- them, add the actuals parameters BIP_Alloc_Form and BIP_Storage_Pool. + -- them, add the actual parameters BIP_Alloc_Form and BIP_Storage_Pool. -- If Alloc_Form_Exp is present, then pass it for the first parameter, -- otherwise pass a literal corresponding to the Alloc_Form parameter -- (which must not be Unspecified in that case). If Pool_Exp is present, @@ -442,9 +442,7 @@ package body Exp_Ch6 is return; end if; - -- Locate the implicit allocation form parameter in the called function. - -- Maybe it would be better for each implicit formal of a build-in-place - -- function to have a flag or a Uint attribute to identify it. ??? + -- Locate the implicit allocation form parameter in the called function Alloc_Form_Formal := Build_In_Place_Formal (Function_Id, BIP_Alloc_Form); @@ -928,9 +926,6 @@ package body Exp_Ch6 is Formal_Suffix : constant String := BIP_Formal_Suffix (Kind); begin - -- Maybe it would be better for each implicit formal of a build-in-place - -- function to have a flag or a Uint attribute to identify it. ??? - -- The return type in the function declaration may have been a limited -- view, and the extra formals for the function were not generated at -- that point. At the point of call the full view must be available and @@ -8821,6 +8816,19 @@ package body Exp_Ch6 is and then not Has_Foreign_Convention (Return_Applies_To (Scope (Obj_Def_Id))); + Constraint_Check_Needed : constant Boolean := +(Has_Discriminants (Obj_Typ) or else Is_Array_Type (Obj_Typ)) + and then Is_Tagged_Type (Obj_Typ) + and then Is_Constrained (Obj_Typ); + -- We are processing a call in the context of something like + -- "X : T := F (...);". This is True if we need to do a constraint + -- check, because T has constrained bounds or discriminants, + -- and F is returning an unconstrained subtype. + -- We are currently doing the check at the call site, + -- which is possible only in the callee-allocates case, + -- which is why we
[COMMITTED 23/40] ada: Specialize syntax error on malformed Abstract_State contract
From: Piotr Trojanek Syntax for the Abstract_State contract is the same as for extended aggregates, but conceptually they are completely different. This patch specializes error messages emitted on syntax errors for these constructs. gcc/ada/ChangeLog: * par-ch13.adb (Get_Aspect_Specifications): Save and restore flag while parsing aspect Abstract_State. * par-ch2.adb (P_Pragma): Same while parsing pragma Abstract_State. * par-ch4.adb (P_Aggregate_Or_Paren_Expr): Specialize error message for contract Abstract_State and extended aggregate. * par.adb (Inside_Abstract_State): Add new context flag. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/par-ch13.adb | 7 +-- gcc/ada/par-ch2.adb | 15 +-- gcc/ada/par-ch4.adb | 9 +++-- gcc/ada/par.adb | 5 + 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/gcc/ada/par-ch13.adb b/gcc/ada/par-ch13.adb index f52136c916a..dbb894f79cd 100644 --- a/gcc/ada/par-ch13.adb +++ b/gcc/ada/par-ch13.adb @@ -503,6 +503,8 @@ package body Ch13 is or else A_Id = Aspect_Refined_Depends then Inside_Depends := True; + elsif A_Id = Aspect_Abstract_State then + Inside_Abstract_State := True; end if; -- Note that we have seen an Import aspect specification. @@ -529,9 +531,10 @@ package body Ch13 is Set_Expression (Aspect, P_Expression); end if; - -- Unconditionally reset flag for Inside_Depends + -- Unconditionally reset flag for being inside aspects - Inside_Depends := False; + Inside_Depends:= False; + Inside_Abstract_State := False; end if; -- Add the aspect to the resulting list only when it was properly diff --git a/gcc/ada/par-ch2.adb b/gcc/ada/par-ch2.adb index 20640d5547b..11c9a8384df 100644 --- a/gcc/ada/par-ch2.adb +++ b/gcc/ada/par-ch2.adb @@ -385,6 +385,8 @@ package body Ch2 is or else Chars (Ident_Node) = Name_Refined_Depends then Inside_Depends := True; + elsif Chars (Ident_Node) = Name_Abstract_State then + Inside_Abstract_State := True; end if; -- Scan arguments. We assume that arguments are present if there is @@ -441,11 +443,11 @@ package body Ch2 is Semicolon_Loc := Token_Ptr; - -- Cancel indication of being within a pragma or in particular a Depends - -- pragma. + -- Cancel indication of being within a pragma - Inside_Depends := False; - Inside_Pragma := False; + Inside_Depends:= False; + Inside_Abstract_State := False; + Inside_Pragma := False; -- Now we have two tasks left, we need to scan out the semicolon -- following the pragma, and we have to call Par.Prag to process @@ -472,8 +474,9 @@ package body Ch2 is exception when Error_Resync => Resync_Past_Semicolon; - Inside_Depends := False; - Inside_Pragma := False; + Inside_Depends:= False; + Inside_Abstract_State := False; + Inside_Pragma := False; return Error; end P_Pragma; diff --git a/gcc/ada/par-ch4.adb b/gcc/ada/par-ch4.adb index 8267a0c06d3..e6cf93ab387 100644 --- a/gcc/ada/par-ch4.adb +++ b/gcc/ada/par-ch4.adb @@ -1607,8 +1607,13 @@ package body Ch4 is -- Improper use of WITH elsif Token = Tok_With then -Error_Msg_SC ("WITH must be preceded by single expression in " & - "extension aggregate"); +if Inside_Abstract_State then + Error_Msg_SC ("state name with options must be enclosed in " & + "parentheses"); +else + Error_Msg_SC ("WITH must be preceded by single expression in " & + "extension aggregate"); +end if; raise Error_Resync; -- Range attribute can only appear as part of a discrete choice list diff --git a/gcc/ada/par.adb b/gcc/ada/par.adb index 5d61fac3c11..0003a33 100644 --- a/gcc/ada/par.adb +++ b/gcc/ada/par.adb @@ -80,6 +80,11 @@ function Par (Configuration_Pragmas : Boolean) return List_Id is -- True within a delta aggregate (but only after the "delta" token has -- been scanned). Used to distinguish syntax errors from syntactically -- correct "deep" delta aggregates (enabled via -gnatX0). + + Inside_Abstract_State : Boolean := False; + -- True within an Abstract_State contract. Used to distinguish syntax error + -- about extended aggregates and about a malformed contract. + Save_Style_Checks : Style_Check_Options; Save_Style_Check : Boolean; -- Variables for storing the original state of whether style checks should -- 2.43.0
[COMMITTED 11/40] ada: Remove outdated comment
From: Ronan Desplanques This patch removes a comment that was made incorrect by the introduction of Is_Self_Hidden. gcc/ada/ChangeLog: * sem_ch3.adb (Analyze_Object_Declaration): Remove comment. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch3.adb | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb index 690d6688958..a8764db6503 100644 --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -4659,9 +4659,7 @@ package body Sem_Ch3 is Set_Has_Completion (Id); end if; - -- Set type and resolve (type may be overridden later on). Note: - -- Ekind (Id) must still be E_Void at this point so that incorrect - -- early usage within E is properly diagnosed. + -- Set type and resolve (type may be overridden later on) Set_Etype (Id, T); -- 2.43.0
Re: [PATCH] x86: Extend the remove_redundant_vector pass
On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common dominator for basic blocks with all uses > since constants and function arguments aren't changed. For broadcast > from variables with a single definition, the single definition is > replaced with the widest broadcast. > > gcc/ > > PR target/92080 > * config/i386/i386-expand.cc (ix86_expand_call): Set > recursive_function to true for recursive call. > * config/i386/i386-features.cc (ix86_place_single_vector_set): > Add an argument for inner scalar, default to nullptr. Set the > source from inner scalar if not nullptr. > (ix86_get_vector_load_mode): Add an argument for scalar mode and > handle integer and float scalar modes. > (replace_vector_const): Add an argument for scalar mode and pass > it to ix86_get_vector_load_mode. > (redundant_load_kind): New. > (redundant_load): Likewise. > (ix86_broadcast_inner): Likewise. > (remove_redundant_vector_load): Also support const0_rtx and > constm1_rtx broadcasts. Handle vector broadcasts from constant > and variable scalars. > * config/i386/i386.h (machine_function): Add recursive_function. > > gcc/testsuite/ > > * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to expect > movdqa instead pxor. > * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise. > * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise. > * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise. > * gcc.target/i386/pr92080-4.c: New test. > * gcc.target/i386/pr92080-5.c: Likewise. > * gcc.target/i386/pr92080-6.c: Likewise. > * gcc.target/i386/pr92080-7.c: Likewise. > * gcc.target/i386/pr92080-8.c: Likewise. > * gcc.target/i386/pr92080-9.c: Likewise. > * gcc.target/i386/pr92080-10.c: Likewise. > * gcc.target/i386/pr92080-11.c: Likewise. > * gcc.target/i386/pr92080-12.c: Likewise. > * gcc.target/i386/pr92080-13.c: Likewise. > * gcc.target/i386/pr92080-14.c: Likewise. > * gcc.target/i386/pr92080-15.c: Likewise. > * gcc.target/i386/pr92080-16.c: Likewise. >+ machine_mode mode = VOIDmode; >+ fixed_size_mode candidate; >+ FOR_EACH_MODE_IN_CLASS (mode, vklass) >+if (is_a (mode, &candidate) >+ && GET_MODE_INNER (candidate) == scalar_mode >+ && GET_MODE_SIZE (candidate) == size) >+ return mode; >+ >+ gcc_unreachable (); Can we just use default_vectorize_related_mode to get the wanted mode, or reuse the code in it? 1591/* The default implementation of TARGET_VECTORIZE_RELATED_MODE. */ 1592 1593opt_machine_mode 1594default_vectorize_related_mode (machine_mode vector_mode, 1595scalar_mode element_mode, 1596poly_uint64 nunits) 1597{ 1598 machine_mode result_mode; 1599 if ((maybe_ne (nunits, 0U) 1600 || multiple_p (GET_MODE_SIZE (vector_mode), 1601 GET_MODE_SIZE (element_mode), &nunits)) 1602 && mode_for_vector (element_mode, nunits).exists (&result_mode) 1603 && VECTOR_MODE_P (result_mode) 1604 && targetm.vector_mode_supported_p (result_mode)) 1605return result_mode; 1606 1607 return opt_machine_mode (); 1608} > + else if (CONST_VECTOR_P (op)) >+{ >+ rtx first = XVECEXP (op, 0, 0); >+ for (int i = 1; i < nunits; ++i) >+ { >+ rtx tmp = XVECEXP (op, 0, i); >+ /* Vector duplicate value. */ >+ if (!rtx_equal_p (tmp, first)) >+ return nullptr; >+ } >+ if (!CONSTANT_P (first)) >+ return nullptr; Is it really needed? Do we have a case where CONST_VECTOR_P has a non-constant component? Also I assume that allsame && CONST_VECTOR_P is already handled in ix86_expand_vector_init, so do we really need to handle it here? 17725 /* If all values are identical, broadcast the value. */ 17726 if (all_same 17727 && ix86_expand_vector_init_duplicate (mmx_ok, mode, target, 17728XVECEXP (vals, 0, 0))) 17729return; >+ /* Check if there is a matching redundant vector load. */ >+ bool matched = false; >+ FOR_EACH_VEC_ELT (loads, i, load) It's expensive. Can we try with a hash like cse_insn? >+ if (load->val >+ && load->kind == kind >+ && load->mode == scalar_mode >+ /* Since CONST_INT load doesn't need memory, it must >+ be in the same basic block if it is in a recursive >+ call. */ This part is a bit tricky, It's used to avoid some regression which I guess just exposes some latent issue? And I didn't see any reason why recursive_call_p needs to be excluded for CONST_INT load and same bb. >+ && (!recursive_call_p >+ || load->bb == bb >+ || !(CONST_INT_P (load->val) >+&& load->kind ==
[COMMITTED 32/40] ada: Set Ekind of components earlier
From: Ronan Desplanques Before this patch, the calls to set the proper Ekind of component entities were delayed in order to catch "premature usage" type of errors. This patch moves those calls to the natural place, at the beginning of Analyze_Component_Declaration, and makes premature usage error dectection use the newer Is_Self_Hidden mechanism. The motivation for this patch is to accomodate future removals of operations on E_Void entities. gcc/ada/ChangeLog: * sem.adb (Analyze): Adapt to new Ekinds. * sem_ch3.adb (Analyze_Component_Declaration): Set Ekind early. (Is_Visible_Component, Record_Type_Definition): Adjust. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem.adb | 3 +-- gcc/ada/sem_ch3.adb | 9 + 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/ada/sem.adb b/gcc/ada/sem.adb index f5ce9f2300e..449fd8ad2c4 100644 --- a/gcc/ada/sem.adb +++ b/gcc/ada/sem.adb @@ -765,8 +765,7 @@ package body Sem is E : constant Entity_Id := Defining_Entity_Or_Empty (N); begin if Present (E) then -if Ekind (E) = E_Void - and then Nkind (N) = N_Component_Declaration +if Nkind (N) = N_Component_Declaration and then Present (Scope (E)) and then Ekind (Scope (E)) = E_Record_Type then diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb index 59f1dd2d8d3..7cec589731f 100644 --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -2046,6 +2046,7 @@ package body Sem_Ch3 is -- Start of processing for Analyze_Component_Declaration begin + Mutate_Ekind (Id, E_Component); Generate_Definition (Id); Enter_Name (Id); @@ -19833,7 +19834,9 @@ package body Sem_Ch3 is -- Start of processing for Is_Visible_Component begin - if Ekind (C) in E_Component | E_Discriminant then + if Ekind (C) in E_Component | E_Discriminant +and then Is_Not_Self_Hidden (C) + then Original_Comp := Original_Record_Component (C); end if; @@ -23123,10 +23126,8 @@ package body Sem_Ch3 is Component := First_Entity (Current_Scope); while Present (Component) loop - if Ekind (Component) = E_Void - and then not Is_Itype (Component) + if Ekind (Component) = E_Component and then not Is_Itype (Component) then -Mutate_Ekind (Component, E_Component); Reinit_Component_Location (Component); Set_Is_Not_Self_Hidden (Component); end if; -- 2.43.0
[COMMITTED 28/40] ada: Simplify handling of selected components as name references
From: Piotr Trojanek The selector_name of a selected_component always points to an identifier than is an object name, i.e. specifically, name of a component or discriminant. There is no need to examine this. Code cleanup; behavior is unaffected. gcc/ada/ChangeLog: * sem_util.adb (Is_Name_Reference): Remove check for selector_name of a selected_component; reuse existing code for indexed components and slices. (Statically_Names_Object): Remove dead code. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_util.adb | 14 +- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index 2b7296b67e8..3c80d236af8 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -18375,6 +18375,7 @@ package body Sem_Util is case Nkind (N) is when N_Indexed_Component +| N_Selected_Component | N_Slice => return @@ -18386,13 +18387,6 @@ package body Sem_Util is when N_Attribute_Reference => return Attribute_Name (N) in Name_Input | Name_Old | Name_Result; - when N_Selected_Component => -return - Is_Name_Reference (Selector_Name (N)) -and then - (Is_Name_Reference (Prefix (N)) -or else Is_Access_Type (Etype (Prefix (N; - when N_Explicit_Dereference => return True; @@ -28517,12 +28511,6 @@ package body Sem_Util is return False; end if; -if Ekind (Entity (Selector_Name (N))) not in - E_Component | E_Discriminant -then - return False; -end if; - declare Comp : constant Entity_Id := Original_Record_Component (Entity (Selector_Name (N))); -- 2.43.0
[COMMITTED 15/40] ada: Remove incorrect bits in Copy_Node documentation
From: Ronan Desplanques This patch removes a leftover reference to the concept of node extension and a note about aspect specification that's been incorrect since at least the latest rework of aspect specification representation. gcc/ada/ChangeLog: * atree.ads (Copy_Node): Fix comment. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/atree.ads | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/gcc/ada/atree.ads b/gcc/ada/atree.ads index 760c63b9bea..615d040c90a 100644 --- a/gcc/ada/atree.ads +++ b/gcc/ada/atree.ads @@ -285,15 +285,11 @@ package Atree is procedure Copy_Node (Source, Destination : Node_Or_Entity_Id); -- Copy the entire contents of the source node to the destination node. - -- The contents of the source node is not affected. If the source node - -- has an extension, then the destination must have an extension also. - -- The parent pointer of the destination and its list link, if any, are - -- not affected by the copy. Note that parent pointers of descendants - -- are not adjusted, so the descendants of the destination node after - -- the Copy_Node is completed have dubious parent pointers. Note that - -- this routine does NOT copy aspect specifications, the Has_Aspects - -- flag in the returned node will always be False. The caller must deal - -- with copying aspect specifications where this is required. + -- The contents of the source node is not affected. The parent pointer of + -- the destination and its list link, if any, are not affected by the copy. + -- Note that parent pointers of descendants are not adjusted, so the + -- descendants of the destination node after the Copy_Node is completed + -- have dubious parent pointers. function New_Copy (Source : Node_Id) return Node_Id; -- This function allocates a new node, and then initializes it by copying -- 2.43.0
[COMMITTED 27/40] ada: Restrict Overlays_Constant flag to selected entities
From: Eric Botcazou Namely E_Constant and E_Variable entities. gcc/ada/ChangeLog: * einfo.ads (Overlays_Constant): Define in constants and variables. * gen_il-gen-gen_entities.adb (Entity_Kind): Move Overlays_Constant semantic flag to... (Constant_Or_Variable_Kind): ...here. * sem_util.adb (Note_Possible_Modification): Add guard. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/einfo.ads | 10 +- gcc/ada/gen_il-gen-gen_entities.adb | 2 +- gcc/ada/sem_util.adb| 1 + 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads index 545c15de24a..1cbac6d9a7d 100644 --- a/gcc/ada/einfo.ads +++ b/gcc/ada/einfo.ads @@ -3927,9 +3927,8 @@ package Einfo is -- Points to the component in the base type. --Overlays_Constant --- Defined in all entities. Set only for E_Constant or E_Variable for --- which there is an address clause that causes the entity to overlay --- a constant object. +-- Defined in constants and variables. Set if there is an address clause +-- that causes the entity to overlay a constant object. --Overridden_Operation -- Defined in subprograms. For overriding operations, points to the @@ -4961,7 +4960,6 @@ package Einfo is --Materialize_Entity --Needs_Debug_Info --Never_Set_In_Source - --Overlays_Constant --Referenced --Referenced_As_LHS --Referenced_As_Out_Parameter @@ -5288,7 +5286,7 @@ package Einfo is --Interface_Name(constants only) --Related_Type (constants only) --Initialization_Statements - --BIP_Initialization_Call + --BIP_Initialization_Call (constants only) --Finalization_Master_Node --Last_Aggregate_Assignment --Activation_Record_Component @@ -5318,6 +5316,7 @@ package Einfo is --Is_Volatile_Full_Access --Optimize_Alignment_Space (constants only) --Optimize_Alignment_Time (constants only) + --Overlays_Constant (constants only) --SPARK_Pragma_Inherited(constants only) --Stores_Attribute_Old_Prefix (constants only) --Treat_As_Volatile @@ -6205,6 +6204,7 @@ package Einfo is --OK_To_Rename --Optimize_Alignment_Space --Optimize_Alignment_Time + --Overlays_Constant --SPARK_Pragma_Inherited --Suppress_Initialization --Treat_As_Volatile diff --git a/gcc/ada/gen_il-gen-gen_entities.adb b/gcc/ada/gen_il-gen-gen_entities.adb index bfa634f8a69..8af261ac036 100644 --- a/gcc/ada/gen_il-gen-gen_entities.adb +++ b/gcc/ada/gen_il-gen-gen_entities.adb @@ -215,7 +215,6 @@ begin -- Gen_IL.Gen.Gen_Entities Sm (May_Inherit_Delayed_Rep_Aspects, Flag), Sm (Needs_Debug_Info, Flag), Sm (Never_Set_In_Source, Flag), -Sm (Overlays_Constant, Flag), Sm (Prev_Entity, Node_Id), Sm (Referenced, Flag), Sm (Referenced_As_LHS, Flag), @@ -353,6 +352,7 @@ begin -- Gen_IL.Gen.Gen_Entities Sm (Last_Aggregate_Assignment, Node_Id), Sm (Optimize_Alignment_Space, Flag), Sm (Optimize_Alignment_Time, Flag), +Sm (Overlays_Constant, Flag), Sm (Prival_Link, Node_Id), Sm (Related_Type, Node_Id), Sm (Return_Statement, Node_Id), diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index c74c10f2b5f..2b7296b67e8 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -25586,6 +25586,7 @@ package body Sem_Util is if Sure and then Modification_Comes_From_Source + and then Ekind (Ent) in E_Constant | E_Variable and then Overlays_Constant (Ent) and then Address_Clause_Overlay_Warnings then -- 2.43.0
[COMMITTED 14/40] ada: Check validity using signedness from the type and not its base type
From: Piotr Trojanek When attribute Valid is applied to a private type, we used the signedness of its implementation base type which wrongly included negative values. gcc/ada/ChangeLog: * exp_attr.adb (Expand_N_Attribute_Reference): When expanding attribute Valid, use signedness from the validated view, not from its base type. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/exp_attr.adb | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb index f1f8424d720..3d1bff93b40 100644 --- a/gcc/ada/exp_attr.adb +++ b/gcc/ada/exp_attr.adb @@ -8183,9 +8183,8 @@ package body Exp_Attr is else declare Uns : constant Boolean := -Is_Unsigned_Type (Ptyp) - or else (Is_Private_Type (Ptyp) -and then Is_Unsigned_Type (PBtyp)); + Is_Unsigned_Type (Validated_View (Ptyp)); + Size : Uint; P: Node_Id := Pref; -- 2.43.0
Re: [PATCH 04/14] aarch64: Add support for unpacked SVE FP comparisons
On Fri, Jun 06, 2025 at 10:02:19AM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > @@ -27292,10 +27291,16 @@ aarch64_emit_sve_invert_fp_cond (rtx target, > > rtx_code code, rtx pred, > > void > > aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx > > op1) > > { > > - machine_mode pred_mode = GET_MODE (target); > >machine_mode data_mode = GET_MODE (op0); > > + rtx pred = aarch64_sve_fp_pred (data_mode, nullptr); > > > > - rtx ptrue = aarch64_ptrue_reg (pred_mode); > > + /* The governing and destination modes. */ > > + machine_mode pred_mode = GET_MODE (pred); > > + machine_mode target_mode = GET_MODE (target); > > + > > + /* Also determines SVE_KNOWN_PTRUE, since an unnatural GP from > > + sve_fp_pred would disable part of the operation. */ > > + bool natural_p = pred_mode == target_mode; > > I'm not sure about calling this "unnatural". The SVE predicate system > was designed with this kind of flexibility in mind. How about: > > /* For partial vector modes, the choice of predicate mode depends > on whether we need to suppress exceptions for inactive elements. > If we do need to suppress exceptions, the predicate mode matches > the element size rather than the container size and the predicate > marks the upper bits in each container as inactive. The predicate > is then a ptrue wrt target_mode but not wrt pred_mode. It is the > ptrueness wrt pred_mode that matters here. > > If we don't need to suppress exceptions, the predicate mode matches > the container size, pred_mode == target_mode, and the predicate is > thus a ptrue wrt both target_mode and pred_mode. */ > bool known_ptrue_p = pred_mode == target_mode; OK, I think referring to containers and elements is a good call. Maybe we ought to add a comment above the definition of VPRED, along the lines of: ;; For partial vector modes, this is the predicate mode associated ;; with the container size. Then your suggestion for patch 06 sounds good too. > > There again, maybe my comment makes no sense to anyone other than me, > so please do push back if you have a better suggestion! Only that perhaps the last part of the first section could be: /* The predicate is then a ptrue wrt target_mode but not wrt pred_mode, it is the latter which matters here. */ I'll be adding 'ptrueness' to my dictionary either way! :) > > >switch (code) > > { > > case UNORDERED: > > [...] > > @@ -27333,11 +27338,21 @@ aarch64_expand_sve_vec_cmp_float (rtx target, > > rtx_code code, rtx op0, rtx op1) > > case UNGE: > >if (flag_trapping_math) > > { > > - /* Work out which elements are ordered. */ > > - rtx ordered = gen_reg_rtx (pred_mode); > > op1 = force_reg (data_mode, op1); > > - aarch64_emit_sve_invert_fp_cond (ordered, UNORDERED, > > - ptrue, true, op0, op1); > > + > > + /* Work out which elements are unordered. */ > > + rtx uo_tmp = gen_reg_rtx (target_mode); > > + aarch64_emit_sve_fp_cond (uo_tmp, UNORDERED, pred, natural_p, > > + op0, op1); > > + > > + /* Invert the result. Use PRED again to maintain the intended > > +trapping behavior. */ > > + if (!natural_p) > > + uo_tmp = gen_lowpart (pred_mode, uo_tmp); > > The !natural_p test isn't necessary here, and IMO it's slightly easier > to follow the code without it. The lowpart is a natural component of > the following XOR/NOT operation and is necessary to make the operation > well-typed. > > > + > > + rtx ordered = gen_reg_rtx (pred_mode); > > + emit_insn (gen_aarch64_pred_z (XOR, pred_mode, > > +ordered, pred, pred, uo_tmp)); > > Although the underlying instruction is an EOR, (and (xor a b) a) isn't > canonical rtl: it should be folded to (and (not b) a) instead. > > So I think we should rename: > > ;; Predicated predicate inverse. > (define_insn "*one_cmpl3" > [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > (and:PRED_ALL > (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) > (match_operand:PRED_ALL 1 "register_operand" "Upa")))] > "TARGET_SVE" > "not\t%0.b, %1/z, %2.b" > ) > > to "@aarch64_pred_one_cmpl_z" and use gen_aarch64_pred_one_compl_z > here. (Not the most elegant instruction name, but I suppose we should > be consistent...) > > This will need updates to the testcase to match NOT rather than EOR. Thanks for the catch & fix, sounds good. > > OK with those changes, thanks. > > Richard
Re: [PATCH 08/14] aarch64: Add support for unpacked SVE FP binary arithmetic
On Fri, Jun 06, 2025 at 12:18:15PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > This patch extends the expanders for unpredicated smax, smin, add, sub, > > mul, min, and max, so that they support partial SVE FP modes. > > > > The relevant insn/split patterns have also been updated. > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve.md (3): Extend from > > SVE_FULL_F to SVE_F, and use aarch64_sve_fp_pred. > > (@aarch64_pred_): Extend from SVE_FULL_F to SVE_F, > > use aarch64_predicate_operand. (ADD/SUB/MUL/MAX/MIN). > > * config/aarch64/aarch64-sve2.md: Likewise, for BF16 operations. > > > > gcc/testsuite/ChangeLog: > > > > * g++.target/aarch64/sve/unpacked_binary_bf16_1.C: New test. > > * g++.target/aarch64/sve/unpacked_binary_bf16_2.C: Likewise. > > * gcc.target/aarch64/sve/unpacked_builtin_fmax_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_builtin_fmax_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_builtin_fmin_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_builtin_fmin_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fadd_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fadd_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fmaxnm_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fmaxnm_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fminnm_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fminnm_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fmul_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fmul_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fsubr_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fsubr_2.c: Likewise. > > OK, thanks. > > > --- > > gcc/config/aarch64/aarch64-sve.md | 70 +-- > > gcc/config/aarch64/aarch64-sve2.md| 10 +-- > > .../aarch64/sve/unpacked_binary_bf16_1.C | 35 ++ > > .../aarch64/sve/unpacked_binary_bf16_2.C | 15 > > .../aarch64/sve/unpacked_builtin_fmax_1.c | 40 +++ > > .../aarch64/sve/unpacked_builtin_fmax_2.c | 16 + > > .../aarch64/sve/unpacked_builtin_fmin_1.c | 40 +++ > > .../aarch64/sve/unpacked_builtin_fmin_2.c | 16 + > > .../gcc.target/aarch64/sve/unpacked_fadd_1.c | 48 + > > .../gcc.target/aarch64/sve/unpacked_fadd_2.c | 22 ++ > > .../aarch64/sve/unpacked_fmaxnm_1.c | 41 +++ > > .../aarch64/sve/unpacked_fmaxnm_2.c | 16 + > > .../aarch64/sve/unpacked_fminnm_1.c | 42 +++ > > .../aarch64/sve/unpacked_fminnm_2.c | 16 + > > .../gcc.target/aarch64/sve/unpacked_fmul_1.c | 39 +++ > > .../gcc.target/aarch64/sve/unpacked_fmul_2.c | 14 > > .../gcc.target/aarch64/sve/unpacked_fsubr_1.c | 42 +++ > > .../gcc.target/aarch64/sve/unpacked_fsubr_2.c | 16 + > > 18 files changed, 498 insertions(+), 40 deletions(-) > > create mode 100644 > > gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_1.C > > create mode 100644 > > gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_2.C > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_1.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_2.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_1.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_2.c > > > > diff --git a/gcc/config/aarch64/aarch64-sve.md > > b/gcc/config/aarch64/aarch64-sve.md > > index 76de511420f..cdad900d9cf 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -5473,27 +5473,27 @@ > > ;; Split a predicated instruction whose predicate is unused into an > > ;; unpredicated instruction. > > (define_split > > - [(set (match_operand:SVE_FULL_F_B16B16 0 "register_operand") > > - (unspec:SVE_FULL_F_B16B16 > > + [(set (match_operand:SVE_F_B16B16 0 "register_operand") > > + (unspec:SVE_F_B16B16 > > [(match_operand: 1 "register_operand") > >(match_operand:SI 4 "aarch64
Re: [PATCH 0/4] RISC-V: Add new segment load/store intrinsics for xtheadvector
Hi Yunze: I thought the T-head vector should just reuse segments load/store pattern from standard vector instruction is enough and then adjust the output name at th_asm_output_opcode , do you have a good reason why we need to add those patterns for T-head vector again? I am really concerned about this patch set increasing the number of patterns too much. On Fri, Jun 6, 2025 at 5:44 PM wrote: > > From: Yunze Zhu > > This series add xtheadvector-specific segment load/store intrinsics support, > including: > [1/4] xtheadvector unit stride segment load/store intrinsics, > [2/4] xtheadvector stride segment load/store intrinsics, > [3/4] xtheadvector indexed stride segment load/store intrinsics, > [4/4] xtheadvector fault-only-first stride segment load intrinsics. > > The goal is to enhance support for xtheadvector extension by > adding generation of xtheadvector-specific segment load/store > instructions with intrinsics. > https://github.com/XUANTIE-RV/thead-extension-spec/pull/66 > > gcc/config.gcc | 2 +- > gcc/config/riscv/riscv-vector-builtins-shapes.cc | 74 > > gcc/config/riscv/riscv-vector-builtins-shapes.h | 2 + > gcc/config/riscv/riscv-vector-builtins-types.def | 82 > > gcc/config/riscv/riscv-vector-builtins.cc| 92 > + > gcc/config/riscv/t-riscv | 19 ++ > gcc/config/riscv/thead-vector-builtins-bases.cc | 318 > > gcc/config/riscv/thead-vector-builtins-bases.h | 44 + > gcc/config/riscv/thead-vector-builtins-functions.def | 12 ++ > gcc/config/riscv/thead-vector.md | 267 > +++ > gcc/config/riscv/thead.cc| 68 > --- > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlseg-vsseg.c| 118 > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlsegff-vsseg.c | 118 > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlsegu-vsseg.c | 116 > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlseguff-vsseg.c | 115 > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlsseg-vssseg.c | 125 > + > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlssegu-vssseg.c | 125 > + > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlxseg-vsxseg.c | 125 > + > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlxsegu-vsxseg.c | 125 > + > 19 files changed, 1922 insertions(+), 25 deletions(-)
Re: [PATCH 1/4] RISC-V: Add support for xtheadvector unit-stride segment load/store intrinsics
> diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc > index de23e410d4c..b78e2762cfa 100644 > --- a/gcc/config/riscv/thead.cc > +++ b/gcc/config/riscv/thead.cc > @@ -1059,12 +1059,17 @@ th_asm_output_opcode (FILE *asm_out_file, const char > *p) > get_attr_type (current_output_insn) == TYPE_VSSEGTE > ? fputs ("th.vsseg", asm_out_file) > : fputs ("th.vlseg", asm_out_file); > - asm_fprintf (asm_out_file, "%c", p[5]); > - fputs ("e", asm_out_file); > - if (strstr (p, "e8")) > - return p+8; > - else > - return p+9; > +if (strstr (p, "b") || strstr (p, "h") || strstr (p, "w")) > + return p+5; > +else > +{ > + asm_fprintf (asm_out_file, "%c", p[5]); > + fputs ("e", asm_out_file); > + if (strstr (p, "e8")) > + return p+8; > + else > + return p+9; > +} indentation seems wrong for this file. > } > >if (get_attr_type (current_output_insn) == TYPE_VLSEGDS || > @@ -1073,36 +1078,51 @@ th_asm_output_opcode (FILE *asm_out_file, const char > *p) > get_attr_type (current_output_insn) == TYPE_VSSEGTS > ? fputs ("th.vssseg", asm_out_file) > : fputs ("th.vlsseg", asm_out_file); > - asm_fprintf (asm_out_file, "%c", p[6]); > - fputs ("e", asm_out_file); > - if (strstr (p, "e8")) > - return p+9; > - else > - return p+10; > +if (strstr (p, "b") || strstr (p, "h") || strstr (p, "w")) > + return p+6; > +else > +{ > + asm_fprintf (asm_out_file, "%c", p[6]); > + fputs ("e", asm_out_file); > + if (strstr (p, "e8")) > + return p+9; > + else > + return p+10; > +} > } > >if (get_attr_type (current_output_insn) == TYPE_VLSEGDUX || > get_attr_type (current_output_insn) == TYPE_VLSEGDOX) > { > fputs ("th.vlxseg", asm_out_file); > - asm_fprintf (asm_out_file, "%c", p[7]); > - fputs ("e", asm_out_file); > - if (strstr (p, "ei8")) > - return p+11; > - else > - return p+12; > + if (strstr (p, "b") || strstr (p, "h") || strstr (p, "w")) > + return p+6; > +else > +{ > + asm_fprintf (asm_out_file, "%c", p[7]); > + fputs ("e", asm_out_file); > + if (strstr (p, "ei8")) > + return p+11; > + else > + return p+12; > +} > } > >if (get_attr_type (current_output_insn) == TYPE_VSSEGTUX || > get_attr_type (current_output_insn) == TYPE_VSSEGTOX) > { > fputs ("th.vsxseg", asm_out_file); > - asm_fprintf (asm_out_file, "%c", p[7]); > - fputs ("e", asm_out_file); > - if (strstr (p, "ei8")) > - return p+11; > - else > - return p+12; > + if (strstr (p, "b") || strstr (p, "h") || strstr (p, "w")) > + return p+6; > +else > +{ > + asm_fprintf (asm_out_file, "%c", p[7]); > + fputs ("e", asm_out_file); > + if (strstr (p, "ei8")) > + return p+11; > + else > + return p+12; > +} > } > >if (get_attr_type (current_output_insn) == TYPE_VNSHIFT)
回复:[PATCH] [RFC] RISC-V: Add extra check to help choosing multilib with equivalent arch.
> But rv32imaf_zca means we don't have zcf, so it can not select > rv32imac, otherwise that means the final binary will have zcf > instruction and that won't be able to run on env that only have > rv32imaf_zca. In my opinion the arch of multilib is the arch used when compiling the lib, that is the binary of lib rv32imac does not contain zcf instructions. Since none of binaries to be link contain zcf instructions, I think the final binary generated by linking does not contain zcf instructions when select rv32imac.
Re: [PATCH 09/14] aarch64: Add support for unpacked SVE FDIV
On Fri, Jun 06, 2025 at 12:46:32PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > This patch extends the unpredicated FP division expander to support > > partial FP modes. It extends the existing patterns used to implement > > UNSPEC_COND_FDIV and it's approximation as needed. > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve.md: (@aarch64_sve_): > > Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. > > (div3): Extend from SVE_FULL_F to SVE_F. > > (@aarch64_frecpe): Likewise. > > (@aarch64_frecps): Likewise. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/sve/unpacked_fdiv_1.c: New test. > > * gcc.target/aarch64/sve/unpacked_fdiv_2.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_fdiv_3.c: Likewise. > > --- > > gcc/config/aarch64/aarch64-sve.md | 50 +-- > > .../gcc.target/aarch64/sve/unpacked_fdiv_1.c | 34 + > > .../gcc.target/aarch64/sve/unpacked_fdiv_2.c | 11 > > .../gcc.target/aarch64/sve/unpacked_fdiv_3.c | 11 > > 4 files changed, 81 insertions(+), 25 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_3.c > > > > diff --git a/gcc/config/aarch64/aarch64-sve.md > > b/gcc/config/aarch64/aarch64-sve.md > > index cdad900d9cf..79a087837de 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -3752,9 +3752,9 @@ > > > > ;; Unpredicated floating-point unary operations. > > (define_insn "@aarch64_sve_" > > - [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w") > > - (unspec:SVE_FULL_F > > - [(match_operand:SVE_FULL_F 1 "register_operand" "w")] > > + [(set (match_operand:SVE_F 0 "register_operand" "=w") > > + (unspec:SVE_F > > + [(match_operand:SVE_F 1 "register_operand" "w")] > > SVE_FP_UNARY))] > >"TARGET_SVE" > >"\t%0., %1." > > I agree the patch is correct for the current definitions of SVE_FP_UNARY > and SVE_FP_BINARY. Since the names are generic, though, I think it > would be worth adding a comment to iterators.md above the definition > of both iterators, saying something like: > > ;; This iterator is currently only used for estimation instructions, > ;; where lax handling of exceptions is assumed to be acceptable. > ;; The iterator is therefore applied unconditionally to partial FP modes. > ;; This would need to be revisited if new operations are added in future. > > (feel free to reword) > > The patch LGTM with that change. > > That said, I suppose the documentation of the -mlow-precision-* > options doesn't explicit state that they change exception behaviour. > Maybe it would be safer to leave the reciprocal stuff out for now, > even though wanting low-precision results with strict exception > conformance seems like an odd combination. Does anyone else have > any thoughts? Yeah, I agree that it's not immediately clear whether -mlow-precision-* alone justifies this. I wouldn't have made this change if the low- precision expansion wasn't predicated on all of: if (!flag_finite_math_only || flag_trapping_math || !flag_unsafe_math_optimizations || optimize_function_for_size_p (cfun) || !use_approx_division_p) return false; Which, IIUC, reflects the fact that it also requires -ffast-math or -funsafe-math-optimizations. I should have placed an assertion (or something similar) to make sure that we have !flag_trapping_math when the low-precision expander is handling partial vector modes. Perhaps something for V2? Happy to drop it for now if not. > > Richard
Re: [PATCH 4/4] RISC-V: Add support for xtheadvector fault-only-first segment load/store intrinsics
I got compilation issue for this patch: ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc: In member function ‘rtx_def* riscv_vector::th_vlseg::exp and(riscv_vector::function_expander&) const’: ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:68:10: error: ‘IS_FAULT_ONLY_FIRST’ was not declared in this scope 68 | if (!IS_FAULT_ONLY_FIRST) | ^~~ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc: At global scope: ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:298:43: error: wrong number of template arguments (2, should be at l east 0) 298 | static CONSTEXPR const th_vlseg th_vlsegff_obj; | ^ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:54:7: note: provided for ‘template class riscv_vecto r::th_vlseg’ 54 | class th_vlseg : public function_base { | ^~~~ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:298:45: error: uninitialized ‘const riscv_vector::th_vlsegff_obj’ [- fpermissive] 298 | static CONSTEXPR const th_vlseg th_vlsegff_obj; | ^~ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:299:44: error: wrong number of template arguments (2, should be at l east 0) 299 | static CONSTEXPR const th_vlseg th_vlseguff_obj; |^ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:54:7: note: provided for ‘template class riscv_vecto r::th_vlseg’ 54 | class th_vlseg : public function_base { | ^~~~ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:299:46: error: uninitialized ‘const riscv_vector::th_vlseguff_obj’ [ -fpermissive] 299 | static CONSTEXPR const th_vlseg th_vlseguff_obj; | ^~~ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:304:55: error: cannot convert ‘const int*’ to ‘const riscv_vector::f unction_base* const’ in initialization 304 | namespace bases { const function_base *const NAME = &NAME##_obj; } | ^ | | | const int* ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:316:1: note: in expansion of macro ‘BASE’ 316 | BASE (th_vlsegff) | ^~~~ ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:304:55: error: cannot convert ‘const int*’ to ‘const riscv_vector::f unction_base* const’ in initialization 304 | namespace bases { const function_base *const NAME = &NAME##_obj; } | ^ | | | const int* ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/thead-vector-builtins-bases.cc:317:1: note: in expansion of macro ‘BASE’ 317 | BASE (th_vlseguff) | ^~~~
Re: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]
On Mon, 9 Jun 2025, Tamar Christina wrote: > This patch introduces two new vector cbranch optabs vec_cbranch_any and > vec_cbranch_all. > > To explain why we need two new optabs let me explain the current cbranch and > its > limitations and what I'm trying to optimize. So sorry for the long email, but > I > hope it explains why I think we want new optabs. > > Today cbranch can be used for both vector and scalar modes. In both these > cases it's intended to compare boolean values, either scalar or vector. > > The optab documentation does not however state that it can only handle > comparisons against 0. So many targets have added code for the vector variant > that tries to deal with the case where we branch based on two non-zero > registers. > > However this code can't ever be reached because the cbranch expansion only > deals > with comparisons against 0 for vectors. This is because for vectors the rest > of > the compiler has no way to generate a non-zero comparison. e.g. the vectorizer > will always generate a zero comparison, and the C/C++ front-ends won't allow > vectors to be used in a cbranch as it expects a boolean value. ISAs like SVE > work around this by requiring you to use an SVE PTEST intrinsics which results > in a single scalar boolean value that represents the flag values. > > e.g. if (svptest_any (..)) > > The natural question is why do we not at expand time then rewrite the > comparison > to a non-zero comparison if the target supports it. > > The reason is we can't safely do so. For an ANY comparison (e.g. != b) this > is > trivial, but for an ALL comparison (e.g. == b) we would have to flip both > branch > and invert the value being compared. i.e. we have to make it a != b > comparison. > > But in emit_cmp_and_jump_insns we can't flip the branches anymore because they > have already been lowered into a fall through branch (PC) and a label, ready > for > use in an if_then_else RTL expression. > > Additionally as mentioned before, cbranches expect the values to be masks, not > values. This kinda works out if you XOR the values, but for FP vectors you'd > need to know what equality means for the FP format. i.e. it's possible for > IEEE 754 values but It's not immediately obvious if this is true for all > formats. > > Now why does any of this matter? Well there are two optimizations we want to > be > able to do. > > 1. Adv. SIMD does not support a vector !=, as in there's no instruction for > it. >For both Integer and FP vectors we perform the comparisons as EQ and then >invert the resulting mask. Ideally we'd like to replace this with just a > XOR >and the appropriate branch. > > 2. When on an SVE enabled system we would like to use an SVE compare + branch >for the Adv. SIMD sequence which could happen due to cost modelling. > However >we can only do so based on if we know that the values being compared > against >are the boolean masks. This means we can't really use combine to do this >because combine would have to match the entire sequence including the >vector comparisons because at RTL we've lost the information that >VECTOR_BOOLEAN_P would have given us. This sequence would be too long for >combine to match due to it having to match the compare + branch sequence >being generated as well. It also becomes a bit messy to match ANY and ALL >sequences. I guess I didn't really understand the above, esp. why all of this is not recoverable on RTL, so I'll state what I think cbranch is supposed to do and then ask some question on the proposal. cbranch is supposed to branch on the result of a comparison, being it on vector or scalar operands, the result of the comparison is a scalar, or rather in full generality, a (scalar) condition-code. Given it uses a COMPARE RTX as comparison operator we can currently only handle whole vector equality/inequality which map to all/any. > To handle these two cases I propose the new optabs vec_cbranch_any and > vec_cbranch_all that expect the operands to be values, not masks, and the > comparison operation is the comparison being performed. The current cbranch > optab can't be used here because we need to be able to see both comparison > operators (for the boolean branch and the data compare branch). > > The initial != 0 or == 0 is folded away into the name as _any and _all > allowing > the target to easily do as it wishes. > > I have intentionally chosen to require cbranch_optab to still be the main one. > i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch because > these two are an expand time only construct. I've also chosen them to be this > such that there's no change needed to any other passes in the middle-end. > > With these two optabs it's trivial to implement the two optimization I > described > above. A target expansion hook is also possible but optabs felt more natural > for the situation. What's the RTL the optabs expand to for SVE
Re: [PATCH] [RFC] RISC-V: Add extra check to help choosing multilib with equivalent arch.
Oh, yeah, I got your point, I was just misreading, the march is rv32imac rather than rv32imafc, that is because of the complicated implication rule. So I think maybe we should mark C-ext as a EXT_FLAG_MACRO Then skip all EXT_FLAG_MACRO during riscv_subset_list::match_score? something like that: diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 6b5440365e3..b536d3758ce 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -410,10 +410,14 @@ riscv_subset_list::match_score (riscv_subset_list *list) const be complicated. TODO: We might consider the version of each extension. */ for (s = list->m_head; s != NULL; s = s->next) -if (this->lookup (s->name.c_str ()) != NULL) - score++; -else - return 0; +{ + if (this-ext-is-macro-ext) + continue; + if (this->lookup (s->name.c_str ()) != NULL) + score++; + else + return 0; +} return score; } diff --git a/gcc/config/riscv/riscv-ext.def b/gcc/config/riscv/riscv-ext.def index 816acaa34f4..b7d443b0c27 100644 --- a/gcc/config/riscv/riscv-ext.def +++ b/gcc/config/riscv/riscv-ext.def @@ -179,7 +179,7 @@ DEFINE_RISCV_EXT( /* FLAG_GROUP */ base, /* BITMASK_GROUP_ID */ 0, /* BITMASK_BIT_POSITION*/ 2, - /* EXTRA_EXTENSION_FLAGS */ 0) + /* EXTRA_EXTENSION_FLAGS */ EXT_FLAG_MACRO) DEFINE_RISCV_EXT( /* NAME */ b, Could you give a try in this direction? On Mon, Jun 9, 2025 at 5:19 PM yunzezhu wrote: > > > But rv32imaf_zca means we don't have zcf, so it can not select > > rv32imac, otherwise that means the final binary will have zcf > > instruction and that won't be able to run on env that only have > > rv32imaf_zca. > > In my opinion the arch of multilib is the arch used when compiling the lib, > that is the binary of lib rv32imac does not contain zcf instructions. > Since none of binaries to be link contain zcf instructions, I think > the final binary generated by linking does not contain zcf instructions when > select rv32imac.
Re: libstdc++: libstdc++: Implement LWG3528 make_from_tuple can perform (the equivalent of) a C-style cast
On 08/06/25 01:32 +0800, Yrong wrote: Hi libstdc++ experts, This patch implement LWG3528 and also implement an improvement that makes std::make_from_tuple SFINAE friendly. I have implemented this LWG issue and SFINAE enhancements for libc++ and Microsoft STL. This is my first patch for libstdc++, hope I didn't missing anything. Please feel free to tell me if I messed anything up! Thanks for the contribution! The patch is attached as mimetype application/octet-stream which makes it harder to read and reply to in most mailers (it must be saved to file and then opened separately, rather than just replying inline). I think it you save it as a .txt file and attach it then gmail will make it easier to work with. (Even better is to use 'git send-email' but that requires some configuration to be able to send mail via gmail servers, see https://git-send-email.io/ for step-by-step intructions). Please confirm that you are using Signed-off-by: with the meaning explained at https://gcc.gnu.org/dco.html From 7dfeb6bdbae91628244b0712dd46c5bf38bb0fff Mon Sep 17 00:00:00 2001 From: Yihan Wang Date: Sun, 8 Jun 2025 01:06:26 +0800 Subject: [PATCH] libstdc++: Implement LWG3528 make_from_tuple can perform (the equivalent of) a C-style cast Implement LWG3528 and make std::make_from_tuple SFINAE friendly. libstdc++-v3/ChangeLog: * include/std/tuple: * testsuite/20_util/tuple/make_from_tuple/dr3528.cc: New test. Signed-off-by: Yihan Wang --- libstdc++-v3/include/std/tuple| 37 - .../20_util/tuple/make_from_tuple/dr3528.cc | 144 ++ 2 files changed, 175 insertions(+), 6 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/tuple/make_from_tuple/dr3528.cc diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple index 2e69af13a98..6f5d2438eaf 100644 --- a/libstdc++-v3/include/std/tuple +++ b/libstdc++-v3/include/std/tuple @@ -2939,15 +2939,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #ifdef __cpp_lib_make_from_tuple // C++ >= 17 - template +#ifdef __cpp_concepts // >= C++20 +template + requires is_constructible_v< + _Tp, decltype(std::get<_Idx>(std::declval<_Tuple>()))...> constexpr _Tp -__make_from_tuple_impl(_Tuple&& __t, index_sequence<_Idx...>) +#else +template +constexpr __enable_if_t< +is_constructible_v<_Tp, + decltype(std::get<_Idx>(std::declval<_Tuple>()))...>, +_Tp> +#endif Do we really need to constrain __make_from_tuple_impl when it's only called by std::make_from_tuple, which is already constrained? We could just put a static_assert in the body of __make_from_tuple_impl to ensure that it isn't misused. + __make_from_tuple_impl(_Tuple&& __t, index_sequence<_Idx...>) This line doesn't need to be reindented, it was correct before. { return _Tp(std::get<_Idx>(std::forward<_Tuple>(__t))...); } +template + typename _Seq = + make_index_sequence>>> +constexpr inline bool __can_make_from_tuple = false; + +template +constexpr inline bool __can_make_from_tuple<_Tp, _Tuple, +index_sequence<_Idx...>> = +is_constructible_v<_Tp, + decltype(std::get<_Idx>(std::declval<_Tuple>()))...>; + #if __cpp_lib_tuple_like // >= C++23 - template +template This should not be reindented. #else - template +template >> +#endif +#ifdef __cpp_concepts // >= C++20 + requires __can_make_from_tuple<_Tp, _Tuple> #endif Doing it like this means that for C++20 we use enable_if *and* a requires-clause, but we don't need both. constexpr _Tp make_from_tuple(_Tuple&& __t) @@ -2962,7 +2987,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } #endif return __make_from_tuple_impl<_Tp>(std::forward<_Tuple>(__t), -make_index_sequence<__n>{}); + make_index_sequence<__n>{}); } #endif @@ -2970,7 +2995,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template<__tuple_like _TTuple, __tuple_like _UTuple, template class _TQual, template class _UQual, typename = make_index_sequence>> - struct __tuple_like_common_reference; +struct __tuple_like_common_reference; This change *is* correct, but isn't related to make_from_tuple so doesn't belong in this patch. I think the diff below is a simpler implementation of the change, using the new variable template as a "Mandates:" for __make_from_tuple_impl and as a "Constraints:" for make_from_tuple. We can do it with a single new #if / #else addition by giving std::make_from_tuple a trailing-return-type and putting the constraints there. What do you think? --- a/libstdc++-v3/include/std/tuple +++ b/libstdc++-v3/include/std/tuple @@ -2939,19 +2939,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #ifdef __cpp_lib_make_from_tuple // C++ >= 17
Re: [PATCH 13/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmetic
On Fri, Jun 06, 2025 at 04:04:18PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ > > SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is > > SVE_RELAXED_GP. > > > > We can only reliably test the 'merging with the third input' (addend) > > and 'independent value' patterns at this stage as the canocalisation that > > reorders the multiplicands based on the second SEL input would be performed > > by the conditional expander. > > > > Another difficulty is that we can't test these fused multiply/SEL combines > > without using __builtin_fma and friends. The reason for this is as follows: > > > > We support COND_ADD, COND_SUB, and COND_MUL optabs, so match.pd will > > canonicalize patterns like ADD/SUB/MUL combined with a VEC_COND_EXPR into > > these conditional forms. Later, when widening_mul tries to fold these into > > conditional fused multiply operations, the transformation fails - simply > > because we haven’t implemented those conditional fused multiply optabs yet. > > > > Hence why this patch lacks tests for BFloat16... > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve.md (*cond__2_relaxed): > > Extend from SVE_FULL_F to SVE_F. > > (*cond__4_relaxed): Extend from SVE_FULL_F_B16B16 > > to SVE_F_B16B16. > > (*cond__any_relaxed): Likewise. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: New test. > > * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. > > * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise. > > OK, thanks. > > BTW, I just realised that all my comments saying "please add a token > test that SEL doesn't get dropped" are probably completely bogus, > since those cases will instead be handed by the follow-on patches > to the conditional optabs. Is that right? Please ignore if so. :) Yeah, that's true where we have a conditional optab, but that isn't the case for any of the interesting unary operations. Maybe I should include '_2.c' style tests for each of those, that explicitly test for SEL? Thanks, Spencer > > Sorry, I've been thinking about this a patch at a time while ignoring > the bigger picture. > > Richard > > > --- > > gcc/config/aarch64/aarch64-sve.md | 38 > > .../aarch64/sve/unpacked_cond_fmla_1.c| 43 +++ > > .../aarch64/sve/unpacked_cond_fmls_1.c| 43 +++ > > .../aarch64/sve/unpacked_cond_fnmla_1.c | 43 +++ > > .../aarch64/sve/unpacked_cond_fnmls_1.c | 43 +++ > > 5 files changed, 191 insertions(+), 19 deletions(-) > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_1.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_1.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c > > > > diff --git a/gcc/config/aarch64/aarch64-sve.md > > b/gcc/config/aarch64/aarch64-sve.md > > index 8c1921ddf5c..e5443980e8b 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -7622,15 +7622,15 @@ > > ;; Predicated floating-point ternary operations, merging with the > > ;; first input. > > (define_insn_and_rewrite "*cond__2_relaxed" > > - [(set (match_operand:SVE_FULL_F 0 "register_operand") > > - (unspec:SVE_FULL_F > > + [(set (match_operand:SVE_F 0 "register_operand") > > + (unspec:SVE_F > > [(match_operand: 1 "register_operand") > > - (unspec:SVE_FULL_F > > + (unspec:SVE_F > > [(match_operand 5) > > (const_int SVE_RELAXED_GP) > > - (match_operand:SVE_FULL_F 2 "register_operand") > > - (match_operand:SVE_FULL_F 3 "register_operand") > > - (match_operand:SVE_FULL_F 4 "register_operand")] > > + (match_operand:SVE_F 2 "register_operand") > > + (match_operand:SVE_F 3 "register_operand") > > + (match_operand:SVE_F 4 "register_operand")] > > SVE_COND_FP_TERNARY) > >(match_dup 2)] > > UNSPEC_SEL))] > > @@ -7668,15 +7668,15 @@ > > ;; Predicated floating-point ternary operations, merging with the > > ;; third input. > > (define_insn_and_rewrite "*cond__4_relaxed" > > - [(set (match_operand:SVE_FULL_F_B16B16 0 "register_operand") > > - (unspec:SVE_FULL_F_B16B16 > > + [(set (match_operand:SVE_F_B16B16 0 "register_operand") > > + (unspec:SVE_F_B16B16 > > [(match_operand: 1 "register_operand") > > - (unspec:SVE_FULL_F_B16B16 > > + (unspec:SVE_F_B16B16 > > [(match_operand 5) > > (const_int SVE_RELAXED_GP) > > - (match_operand:SVE_FULL_F_B16B16 2 "register_operand") > > - (match_operand:SVE_FULL_F_B16B16 3 "register_ope
Re: [PATCH 1/4] libstdc++: Format empty chrono-spec for the calendar types directly.
On Fri, Jun 6, 2025 at 1:02 PM Tomasz Kamiński wrote: > This patch change implementation of the formatters for the calendar types, > so they no longer delegate to operator<< for ostream in case of empty > chrono-spec. > Instead of that, we define the behavior of the in terms of format > specifiers, > that format specifiers are supplied by each formatter as argument to > _M_parse > Similary is formatter constructor it's __formatter_chrono from revalant > default > spec, so we preserve functionality of calling format on default cosntructed > formatters. > > Expressing the existing functionality of the operaetor ostream, requires > providing two additional features: > * printing "is not a valid sth" for !ok objects, > * printing a weekday index in the month. > > The formatter functionality is enabled by setting spec _M_debug > (corresponding > to '?') that is currently unused. This is currently supported only for > subset of format specifiers used by the ostream operators. In future, we > could > make this user configurable (by adding '?' after 'L') and cover all flags. > > For the handling of the weekday index (for weekday_indexed, month_weekday, > year_month_weekday), we need to introduce an new format specifier. To not > conflict with future extension we use '%\0' (embedded null) as this > character > cannot be placed in valid format spec. > > Finally, the format strings for calendar types subsets each other, e.g. > year_month_weekday_last ("%Y/%b/%a[last])" contains month_weekday_last, > weekday_last, weekday, e.t.c.. We introduce a _ChronoFormats class that > provide > consteval accessors to format specs, internally sharing they > representations. > > libstdc++-v3/ChangeLog: > > * include/bits/chrono_io.h (__format::_ChronoFormats): Define. > (__formatter_chrono::__formatter_chrono()) > (__formatter_chrono::__formatter_chrono(_ChronoSpec<_CharT>)): > Define. > (__formatter_chrono::_M_parse): Add parameter with default spec, > namd merge it with new values. Handle '%\0' as weekday index > specifier. > (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B) > (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e) > (__formatter_chrono::_M_F): Support _M_debug flag. > (__formatter_chrono::_M_wi, __formatter_chrono::_S_weekday_index): > Define. > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter) > (std::formatter): > Define __defSpec, and pass it as argument to _M_prase and > constructor of __formatter_chrono. > --- > libstdc++-v3/include/bits/chrono_io.h | 430 +++--- > 1 file changed, 381 insertions(+), 49 deletions(-) > > diff --git a/libstdc++-v3/include/bits/chrono_io.h > b/libstdc++-v3/include/bits/chrono_io.h > index c5c5e4bae53..e36cf1ffd31 100644 > --- a/libstdc++-v3/include/bits/chrono_io.h > +++ b/libstdc++-v3/include/bits/chrono_io.h > @@ -238,6 +238,93 @@ namespace __format >operator|=(_ChronoParts& __x, _ChronoParts __y) noexcept >{ return __x = __x | __y; } > > + template > + struct _ChronoFormats > + { > +using _String_view = basic_string_view<_CharT>; > + > +static consteval > +_String_view > +_S_f() noexcept > +{ return _GLIBCXX_WIDEN("%F"); } > + > +static consteval > +_String_view > +_S_ymd() noexcept > +{ return _GLIBCXX_WIDEN("%Y/%b/%d"); } > + > +static consteval > +_String_view > +_S_ym() noexcept > +{ return _S_ymd().substr(0, 5); } > + > +static consteval > +_String_view > +_S_md() noexcept > +{ return _S_ymd().substr(3); } > + > +static consteval > +_String_view > +_S_y() noexcept > +{ return _S_ymd().substr(0, 2); } > + > +static consteval > +_String_view > +_S_m() noexcept > +{ return _S_ymd().substr(3, 2); } > + > +static consteval > +_String_view > +_S_d() noexcept > +{ return _S_ymd().substr(6, 2); } > + > +static consteval > +_String_view > +_S_ymwi() noexcept > +// %\0 is extension for handling weekday index > +{ return _String_view(_GLIBCXX_WIDEN("%Y/%b/%a[%\0]"), 12); } > + > +static consteval > +_String_view > +_S_mwi() noexcept > +{ return _S_ymwi().substr(3); } > + > +static consteval > +_String_view > +_S_wi() noexcept > +{ return _S_ymwi().substr(6); } > + > +static consteval > +_String_view > +_S_w() noexcept > +{ return _S_ymwi().substr(6, 2); } > + > +static consteval > +_String_view > +_S_ymwl() noexcept > +{ return _GLIBCXX_WIDEN("%Y/%b/%a[last]"); } > + > +static consteval > +_String_vie
Re: [AUTOFDO][AARCH64] Add support for profilebootstrap
Kugan Vivekanandarajah writes: > [sending again as the email seems to have not delivered] > > Hi Richard, > >> On 7 Jun 2025, at 1:12 am, Richard Sandiford >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Jan Hubicka writes: Should I go with: +autofdo_target +autofdo_target="i386" +case "${target}" in + aarch64-*-*) +autofdo_target="aarch64" +;; +esac As in the first version? I can test and send a patch for review if there is no other better alternative. >>> >>> This looks OK - I can not think of better alternative. At some point we >>> need to translate the target CPU to directory name. The build machinery >>> is not exactly my strong point, so perhaps someone with more knowledge >>> can chime in. >> >> Can we at least keep the variable name as "cpu_type", rather than add >> a new classification? I don't see a good reason why autofdo would need >> to use its own private naming scheme, rather than sticking to the one >> that gcc/configure* already uses (i.e. the one used for gcc/config/). >> >> It is incovenient that the toplevel doesn't have access to the logic >> used to set that variable though... >> > > I changed it to: > +# Special case cpu_type for x86_64 as it shares AUTO_PROFILE from i386. > +if test "${cpu_type}" = "x86_64" ; then > + cpu_type="i386" > +fs > > Is this ok? Tested on x86_64 and aarch64 linux-gnu. > > Thanks, > Kugan > > From 30445db0509089117bd03c70b0cc00656e79e1ab Mon Sep 17 00:00:00 2001 > From: Kugan Vivekanandarajah > Date: Fri, 6 Jun 2025 22:11:57 -0700 > Subject: [PATCH] [AutoFDO] Fix profile bootstrap for x86_64 > > This patch fixes profile bootstrap for x86_64 by special > caseing cpu_type for x86_64 as it shares AUTO_PROFILE > from i386. > > ChangeLog: > > * configure.ac: Special case cpu_type for x86_64. > * configure: Regenerate. > > Signed-off-by: Kugan Vivekanandarajah > --- > configure| 4 > configure.ac | 4 > 2 files changed, 8 insertions(+) > > diff --git a/configure b/configure > index ed1e5a4e818..8ec8a9f111b 100755 > --- a/configure > +++ b/configure > @@ -3397,6 +3397,10 @@ case "${target}" in > esac > > cpu_type=`echo ${host} | sed 's/-.*$//'` > +# Special case cpu_type for x86_64 as it shares AUTO_PROFILE from i386. > +if test "${cpu_type}" = "x86_64" ; then > + cpu_type="i386" > +fi > > > # Disable libssp for some systems. > diff --git a/configure.ac b/configure.ac > index 33c130f4e02..5fa101f77fd 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -622,6 +622,10 @@ case "${target}" in > esac > > cpu_type=`echo ${host} | sed 's/-.*$//'` > +# Special case cpu_type for x86_64 as it shares AUTO_PROFILE from i386. I'd prefer something like: # Map the first part of the triplet to the GCC config/ name. # Many targets do not use this variable yet, so the logic is intentionally # incomplete. If the variable becomes more widely used in future, # we should probably look at trying to share code with gcc/config.gcc. OK with that change, thanks. Richard > +if test "${cpu_type}" = "x86_64" ; then > + cpu_type="i386" > +fi > AC_SUBST(cpu_type) > > # Disable libssp for some systems.
Re: [PATCH 2/3]AArch64: Support eliding ptest on masked compares [PR118974]
Tamar Christina writes: > In the example > > void f1 () > { > for (int i = 0; i < N; i++) > { > b[i] += a[i]; > if (a[i] > 0) > break; > } > } > > when compiled for SVE we generate: > > ld1wz28.s, p7/z, [x4, x0, lsl 2] > cmpgt p14.s, p7/z, z28.s, #0 > ptest p15, p14.b > b.none .L3 > > Where the ptest isn't needed since the branch only cares about the Z and NZ > flags. > > GCC Today supports eliding this through the pattern *cmp_ptest > however this pattern only supports the removal when the outermost context is a > CMP where the predicate is inside the condition itself. > > This typically only happens for an unpredicated CMP as a ptrue will be > generated > during expand. > > In the case about at the GIMPLE level we have > > mask_patt_14.15_57 = vect__2.11_52 > { 0, ... }; > vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57; > if (vec_mask_and_58 != { 0, ... }) > goto ; [5.50%] > else > goto ; [94.50%] > > where the loop mask is applied to the compare as an AND. > > The loop mask is moved into the compare by the pattern *cmp_and > which moves the mask inside if the current mask is a ptrue since > p && true -> p. > > However this happens after combine, and so we can't both move the predicate > inside AND eliminate the ptests. > > To fix this this patch adds a new pattern *cmp_and_ptest which > combines these two patterns together allowing us to both push the predicate > inside and eliminate the ptest. > > After this patch we generate > > ld1wz28.s, p7/z, [x4, x0, lsl 2] > cmpgt p14.s, p7/z, z28.s, #0 > b.none .L3 > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/118974 > * config/aarch64/aarch64-sve.md (*cmp_and_ptest): New. > > gcc/testsuite/ChangeLog: > > PR target/118974 > * gcc.target/aarch64/sve/pr119351.c: Update codegen. > * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise. > > --- > > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index > bf7569f932b6d7392b9c4fb7b94efafb6fd184c2..fe7f52ee1ed400b4eda28e3f90edc0044a5aa7a9 > 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -8319,6 +8319,40 @@ (define_insn_and_rewrite "*cmp_ptest" >} > ) > > +;; Predicated integer comparisons, formed by combining a PTRUE-predicated > +;; comparison with an AND in which only the flags result is interesting. > +(define_insn_and_rewrite "*cmp_and_ptest" > + [(set (reg:CC_NZC CC_REGNUM) > + (unspec:CC_NZC > + [(match_operand:VNx16BI 1 "register_operand") > +(match_operand 4) > +(const_int SVE_KNOWN_PTRUE) > +(and: > + (unspec: > +[(match_operand 5) > + (const_int SVE_KNOWN_PTRUE) > + (SVE_INT_CMP: > + (match_operand:SVE_I 2 "register_operand") > + (match_operand:SVE_I 3 > "aarch64_sve_cmp__operand"))] > +UNSPEC_PRED_Z) > + (match_operand: 6 "register_operand"))] > + UNSPEC_PTEST)) > + (clobber (match_scratch: 0))] > + "TARGET_SVE" > + {@ [ cons: =0, 1, 2 , 3; attrs: pred_clobber ] > + [ &Upa, Upl, w , ; yes ] > cmp\t%0., %6/z, %2., #%3 > + [ ?Upl, 0 , w , ; yes ] ^ > + [ Upa , Upl, w , ; no ] ^ > + [ &Upa, Upl, w , w; yes ] > cmp\t%0., %6/z, %2., %3. > + [ ?Upl, 0 , w , w; yes ] ^ > + [ Upa , Upl, w , w; no ] ^ > + } > + "&& !rtx_equal_p (operands[4], operands[5])" > + { > +operands[5] = copy_rtx (operands[4]); > + } > +) > + This doesn't look like a legitimate fold, since the LAST flag from the original PTEST will apply to the last bit of the predicate, which for a .H, .S, or .D comparison will be zero. The fused compare will instead set the LAST flag based on the last full SVE_I element. For example, for a .D comparison on 128-bit SVE, we'd get (lsb first): a000 b000 where a and b are the comparison results. The ptest is testing this against a governing predicate of: so LAST should always be zero. The fold above would instead set LAST to b. I think we need to differentiate users that care about LAST from those that don't, such as by using CC_Z for NE/EQ ANY/NONE comparisons. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151 for more details. Thanks, Richard > ;; Predicated integer comparisons, formed by combining a PTRUE-predicated > ;; comparison with an AND. Split the instruction into its preferred form > ;; at the earliest opportunity, in order to get rid of the redundant > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr119351.c > b/gcc/testsuite/gcc.target/aarch64/sve/