[PATCH 1/2] i386: Deprecate -m[no-]avx10.1 and make -mno-avx10.1-512 to disable the whole AVX10.1
Based on the feedback we got, we would like to re-alias avx10.x to 512 bit in the future. This leaves the current avx10.1 alias to 256 bit inconsistent. Since it has been there for GCC 14.1 and GCC 14.2, we decide to deprecate avx10.1 alias. The current proposal is not adding it back in the future, but it might change if necessary. For -mno- options, it is confusing what it is disabling when it comes to avx10. Since there is barely usage enabling AVX10 with 512 bit then disabling it, we will only provide -mno-avx10.x options in the future, disabling the whole AVX10.x. If someone really wants to disable 512 bit after enabling it, -mavx10.x-512 -mno-avx10.x -mavx10.x-256 is the only way to do that since we also do not want to break the usual expression on -m- options enabling everything mentioned. However, for avx10.1, since we deprecated avx10.1, there is no reason we should have -mno-avx10.1. Thus, we need to keep -mno-avx10.1-[256,512]. To avoid confusion, we will make -mno-avx10.1-512 to disable the whole AVX10.1 set to match the future -mno-avx10.x. gcc/ChangeLog: * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX2_UNSET): Change AVX10.1 unset macro. (OPTION_MASK_ISA2_AVX10_1_256_UNSET): Removed. (OPTION_MASK_ISA2_AVX10_1_512_UNSET): Removed. (OPTION_MASK_ISA2_AVX10_1_UNSET): New. (ix86_handle_option): Adjust AVX10.1 unset macro. * common/config/i386/i386-isas.h: Remove avx10.1. * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Ditto. (ix86_option_override_internal): Adjust warning message. * config/i386/i386.opt: Remove mavx10.1. * doc/extend.texi: Remove avx10.1 and adjust doc. * doc/sourcebuild.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10-check.h: Change to avx10.1-256. * gcc.target/i386/avx10_1-1.c: Ditto. * gcc.target/i386/avx10_1-13.c: Ditto. * gcc.target/i386/avx10_1-14.c: Ditto. * gcc.target/i386/avx10_1-21.c: Ditto. * gcc.target/i386/avx10_1-22.c: Ditto. * gcc.target/i386/avx10_1-23.c: Ditto. * gcc.target/i386/avx10_1-24.c: Ditto. * gcc.target/i386/avx10_1-3.c: Ditto. * gcc.target/i386/avx10_1-5.c: Ditto. * gcc.target/i386/avx10_1-6.c: Ditto. * gcc.target/i386/avx10_1-8.c: Ditto. * gcc.target/i386/pr117946.c: Ditto. * gcc.target/i386/avx10_1-12.c: Adjust warning message. * gcc.target/i386/avx10_1-19.c: Ditto. * gcc.target/i386/avx10_1-17.c: Adjust to no-avx10.1-512. --- gcc/common/config/i386/i386-common.cc | 18 -- gcc/common/config/i386/i386-isas.h | 1 - gcc/config/i386/i386-options.cc | 3 +-- gcc/config/i386/i386.opt| 5 - gcc/doc/extend.texi | 11 --- gcc/doc/sourcebuild.texi| 5 + gcc/testsuite/gcc.target/i386/avx10-check.h | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-1.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-12.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-13.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-14.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-17.c | 4 ++-- gcc/testsuite/gcc.target/i386/avx10_1-19.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-21.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-22.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-23.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-24.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-3.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-5.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-6.c | 2 +- gcc/testsuite/gcc.target/i386/avx10_1-8.c | 2 +- gcc/testsuite/gcc.target/i386/pr117946.c| 2 +- 22 files changed, 31 insertions(+), 46 deletions(-) diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 52ad1c5acd1..793d6845684 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -249,7 +249,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVXNECONVERT_UNSET \ | OPTION_MASK_ISA2_AVXVNNIINT16_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET \ - | OPTION_MASK_ISA2_AVX10_1_256_UNSET) + | OPTION_MASK_ISA2_AVX10_1_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512DQ_UNSET | OPTION_MASK_ISA_AVX512BW_UNSET \ @@ -325,11 +325,9 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F #define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512 #define OPTION_MASK_ISA2_USER_MSR_UNSET OPTION_MASK_ISA2_USER_MSR -#define OPTION_MASK_ISA2_AVX10_1_256_UNSET \ - (OPTION_MASK_ISA2_AVX10_1_256 | OPTION_MASK_ISA2_AVX10_
[PATCH 0/2] i386: Adjust AVX10 related options
Hi all, According to the previous feedback on our RFC for AVX10 option adjustment and discussion with LLVM, we finalized how we are going to handle that. The overall direction is to re-alias avx10.x alias to 512 bit and only using -mno-avx10.x to disable everything instead of the current confusing -mno-avx10.x-[256,512], leading to deprecating -mno-avx10.x-[256,512]. It is fine for AVX10.2 since it is just introduced. However, it will become tricky for AVX10.1 introduced in GCC 14. Thus, we will deprecate avx10.1 alias. For -mno- options, since we do not have avx10.1, having -mno-avx10.1 would become weird. We will keep both -mno-avx10.1-256 and -mno-avx10.1-512, while changing -mno-avx10.1-512 also disabling the whole AVX10.1 to align with future. For option re-design to follow the latter length to determine the AVX10 size, we choose not to change that since it will break the previous impression on -m options should enable everything after that. Also it will make options like -mavx10.2-512 -mavx10.4-256 losing its flexibilty on only enabling 512 bit on AVX10.1/2 but enabling 256 bit on AVX10.3/4. Upcoming are the two patches, the first patch will be backported to GCC 14. Ok for trunk? Thx, Haochen
Re: [PATCH] tree-optimization/86270 - improve SSA coalescing for loop exit test
On Wed, 12 Feb 2025, Andrew Pinski wrote: > On Wed, Feb 12, 2025 at 4:04 AM Richard Biener wrote: > > > > The PR indicates a very specific issue with regard to SSA coalescing > > failures because there's a pre IV increment loop exit test. While > > IVOPTs created the desired IL we later simplify the exit test into > > the undesirable form again. The following fixes this up during RTL > > expansion where we try to improve coalescing of IVs. That seems > > easier that trying to avoid the simplification with some weird > > heuristics (it could also have been written this way). > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > OK for trunk? > > > > Thanks, > > Richard. > > > > PR tree-optimization/86270 > > * tree-outof-ssa.cc (insert_backedge_copies): Pattern > > match a single conflict in a loop condition and adjust > > that avoiding the conflict if possible. > > > > * gcc.target/i386/pr86270.c: Adjust to check for no reg-reg > > copies as well. > > --- > > gcc/testsuite/gcc.target/i386/pr86270.c | 3 ++ > > gcc/tree-outof-ssa.cc | 49 ++--- > > 2 files changed, 47 insertions(+), 5 deletions(-) > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c > > b/gcc/testsuite/gcc.target/i386/pr86270.c > > index 68562446fa4..89b9aeb317a 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr86270.c > > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c > > @@ -13,3 +13,6 @@ test () > > > > /* Check we do not split the backedge but keep nice loop form. */ > > /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */ > > +/* Check we do not end up with reg-reg moves from a pre-increment IV > > + exit test. */ > > +/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, %\?\[er\].x" > > } } */ > > diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc > > index d340d4ba529..f285c81599e 100644 > > --- a/gcc/tree-outof-ssa.cc > > +++ b/gcc/tree-outof-ssa.cc > > @@ -1259,10 +1259,9 @@ insert_backedge_copies (void) > > if (gimple_nop_p (def) > > || gimple_code (def) == GIMPLE_PHI) > > continue; > > - tree name = copy_ssa_name (result); > > - gimple *stmt = gimple_build_assign (name, result); > > imm_use_iterator imm_iter; > > gimple *use_stmt; > > + auto_vec uses; > > /* The following matches trivially_conflicts_p. */ > > FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result) > > { > > @@ -1273,11 +1272,51 @@ insert_backedge_copies (void) > > { > > use_operand_p use; > > FOR_EACH_IMM_USE_ON_STMT (use, imm_iter) > > - SET_USE (use, name); > > + uses.safe_push (use); > > } > > } > > - gimple_stmt_iterator gsi = gsi_for_stmt (def); > > - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); > > + /* When there is just a conflicting statement try to > > +adjust that to refer to the new definition. > > +In particular for now handle a conflict with the > > +use in a (exit) condition with a NE compare, > > +replacing a pre-IV-increment compare with a > > +post-IV-increment one. */ > > + if (uses.length () == 1 > > + && is_a (USE_STMT (uses[0])) > > + && gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR > > + && is_gimple_assign (def) > > + && gimple_assign_rhs1 (def) == result > > + && (gimple_assign_rhs_code (def) == PLUS_EXPR > > + || gimple_assign_rhs_code (def) == MINUS_EXPR > > + || gimple_assign_rhs_code (def) == > > POINTER_PLUS_EXPR) > > + && TREE_CODE (gimple_assign_rhs2 (def)) == > > INTEGER_CST) > > + { > > + gcond *cond = as_a (USE_STMT (uses[0])); > > + tree *adj; > > + if (gimple_cond_lhs (cond) == result) > > + adj = gimple_cond_rhs_ptr (cond); > > + else > > + adj = gimple_cond_lhs_ptr (cond); > > + tree name = copy_ssa_name (result); > > Should this be `copy_ssa_name (*adj)`? Since the new name is based on > `*adj` rather than based on the result. Good point, I've adjusted this in my local copy. Richard. > Thanks, > Andrew Pinski > > > + gimple *stmt > > + = gimple_build_assign (name, > > + gimple_assign_rhs_code (def), > > +
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
>>> Other thoughts? >> >> The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff >> we can't/don't model in the pipeline, but I have no idea how to model >> the VL=0 case there. > Maybe so, but what Edwin is doing looks sensible enough. It wouldn't be > the first time a hook got (ab)used in ways that weren't part of the > original intent. I don't fully understand what's happening. So the hoisting is being done speculatively here? And it just happens to be "bad" because that might cause a VL=0 case. But are we sure a lack of speculation cannot cause such cases? Also, why doesn't the vsetvl pass fix the situation? IMHO we need to understand the problem more thoroughly before changing things. In the end LCM minimizes the number of vsetvls and inserts them at the "earliest" point. If that is not sufficient I'd say we need modify the constraints (maybe on a per-uarch basis)? On a separate note: How about we move the vsetvl pass after sched2? Then we could at least rely on LCM doing its work uninhibited and wouldn't reorder vsetvls afterwards. Or do we somehow rely on rtl_dce and BB reorder to run afterwards? That won't help with the problem here but might with others. -- Regards Robin
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 20:46, Jeff Law wrote: >> BTW what exactly is speculative scheduling ? As in what is it actually >> trying to >> schedule ahead ? > In simplest terms assume we have this kind of graph > > 0 > / \ >1-->2 > > > The scheduler knows how to build scheduling regions, essentially > extended basic blocks. In this case we have two regions one with the > blocks 0,1 the other being just block 2. > > In the multi-block region 0,1 we allow insns from block 1 to speculate > into block 0. > > Let's assume we're on a simple 2-wide in order machine and somewhere in > bb0 we there's a slot available for an insn that we couldn't fill with > anything useful from bb0. In that case we may speculate an insn from > bb1 into bb0 to execute "for free" in that unused slot. > > That's the basic idea. It was particularly helpful for in-order cores > in the past. It's dramatically less important for an out of order core > since those are likely doing the speculation in hardware. That is great info, super helpful. Given this background, I'd argue that Edwin's patch to barricade vsetvls in scheduling is the right thing to do anyways, this issue or otherwise. > Naturally if you're using icounts for evaluation this kind of behavior > is highly undesirable since that kind of evaluation says the > transformation is bad, but in reality on certain designs is quite helpful. Sure. Thx, -Vineet
Re: [PATCH] libstdc++: Implement P3138R5 views::cache_latest
On Thu, 13 Feb 2025, Jonathan Wakely wrote: > On Tue, 11 Feb 2025 at 05:59, Patrick Palka wrote: > > > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? > > > > -- >8 -- > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/version.def (ranges_cache_latest): Define. > > * include/bits/version.h: Regenerate. > > * include/std/ranges (cache_latest_view): Define for C++26. > > (cache_latest_view::_Iterator): Likewise. > > (cache_latest_view::_Sentinel): Likewise. > > (views::__detail::__can_cache_latest): Likewise. > > (views::_CacheLatest, views::cache_latest): Likewise. > > * testsuite/std/ranges/adaptors/cache_latest/1.cc: New test. > > The test is missing from the patch. Whoops, below is the complete patch. > > > --- > > libstdc++-v3/include/bits/version.def | 8 ++ > > libstdc++-v3/include/bits/version.h | 10 ++ > > libstdc++-v3/include/std/ranges | 189 ++ > > 3 files changed, 207 insertions(+) > > > > diff --git a/libstdc++-v3/include/bits/version.def > > b/libstdc++-v3/include/bits/version.def > > index 002e560dc0d..6fb5db2e1fc 100644 > > --- a/libstdc++-v3/include/bits/version.def > > +++ b/libstdc++-v3/include/bits/version.def > > @@ -1837,6 +1837,14 @@ ftms = { > >}; > > }; > > > > +ftms = { > > + name = ranges_cache_latest; > > + values = { > > +v = 202411; > > +cxxmin = 26; > > + }; > > +}; > > + > > ftms = { > >name = ranges_concat; > >values = { > > diff --git a/libstdc++-v3/include/bits/version.h > > b/libstdc++-v3/include/bits/version.h > > index 70de189b1e0..db61a396c45 100644 > > --- a/libstdc++-v3/include/bits/version.h > > +++ b/libstdc++-v3/include/bits/version.h > > @@ -2035,6 +2035,16 @@ > > #endif /* !defined(__cpp_lib_is_virtual_base_of) && > > defined(__glibcxx_want_is_virtual_base_of) */ > > #undef __glibcxx_want_is_virtual_base_of > > > > +#if !defined(__cpp_lib_ranges_cache_latest) > > +# if (__cplusplus > 202302L) > > +# define __glibcxx_ranges_cache_latest 202411L > > +# if defined(__glibcxx_want_all) || > > defined(__glibcxx_want_ranges_cache_latest) > > +# define __cpp_lib_ranges_cache_latest 202411L > > +# endif > > +# endif > > +#endif /* !defined(__cpp_lib_ranges_cache_latest) && > > defined(__glibcxx_want_ranges_cache_latest) */ > > +#undef __glibcxx_want_ranges_cache_latest > > + > > #if !defined(__cpp_lib_ranges_concat) > > # if (__cplusplus > 202302L) > > # define __glibcxx_ranges_concat 202403L > > diff --git a/libstdc++-v3/include/std/ranges > > b/libstdc++-v3/include/std/ranges > > index 5c795a90fbc..db9a00be264 100644 > > --- a/libstdc++-v3/include/std/ranges > > +++ b/libstdc++-v3/include/std/ranges > > @@ -58,6 +58,7 @@ > > #define __glibcxx_want_ranges_as_const > > #define __glibcxx_want_ranges_as_rvalue > > #define __glibcxx_want_ranges_cartesian_product > > +#define __glibcxx_want_ranges_cache_latest > > #define __glibcxx_want_ranges_concat > > #define __glibcxx_want_ranges_chunk > > #define __glibcxx_want_ranges_chunk_by > > @@ -1534,6 +1535,8 @@ namespace views::__adaptor > > this->_M_payload._M_apply(_Optional_func{__f}, __i); > > return this->_M_get(); > > } > > + > > + using _Optional_base<_Tp>::_M_reset; I also forgot to mention this change in the ChangeLog. > >}; > > > > template > > @@ -10203,6 +10206,192 @@ namespace ranges > > } // namespace ranges > > #endif // __cpp_lib_ranges_concat > > > > +#if __cpp_lib_ranges_cache_latest // C++ >= 26 > > +namespace ranges > > +{ > > + template > > +requires view<_Vp> > > + class cache_latest_view : public view_interface> > > + { > > +_Vp _M_base = _Vp(); > > + > > +using __cache_t = conditional_t>, > > + add_pointer_t>, > > + range_reference_t<_Vp>>; > > __conditional_t is cheaper to instantiate than conditional_t, so when > it doesn't affect the mangled name of a public symbol we should prefer > __conditional_t. Ack, fixed below. -- >8 -- libstdc++-v3/ChangeLog: * include/bits/version.def (ranges_cache_latest): Define. * include/bits/version.h: Regenerate. * include/std/ranges (__detail::__non_propagating_cache::_M_reset): Export from base class _Optional_base. (cache_latest_view): Define for C++26. (cache_latest_view::_Iterator): Likewise. (cache_latest_view::_Sentinel): Likewise. (views::__detail::__can_cache_latest): Likewise. (views::_CacheLatest, views::cache_latest): Likewise. * testsuite/std/ranges/adaptors/cache_latest/1.cc: New test. --- libstdc++-v3/include/bits/version.def | 8 + libstdc++-v3/include/bits/version.h | 10 + libstdc++-v3/include/std/ranges | 189 ++ .../std/ranges/adaptors/cache_latest/1.cc | 72 +++ 4 files changed, 279 insertions(+)
Re: [PATCH v2 2/8] LoongArch: Allow moving TImode vectors
Hi, If only apply the first and second patches, the code will not compile. Otherwise LGTM. Thanks! 在 2025/2/13 下午5:41, Xi Ruoyao 写道: We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX registers so we won't get a reload failure when we start to save TImode vectors in these registers. This implicitly depends on the vrepli optimization: without it we'd try "vrepli.q" which does not really exist and trigger an ICE. gcc/ChangeLog: * config/loongarch/lsx.md (mov): Remove. (movmisalign): Remove. (mov_lsx): Remove. * config/loongarch/lasx.md (mov): Remove. (movmisalign): Remove. (mov_lasx): Remove. * config/loongarch/simd.md (ALLVEC_TI): New mode iterator. (mov): Likewise. (mov_simd): New define_insn_and_split. --- gcc/config/loongarch/lasx.md | 40 -- gcc/config/loongarch/lsx.md | 36 --- gcc/config/loongarch/simd.md | 42 3 files changed, 42 insertions(+), 76 deletions(-) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index a37c85a25a4..d82ad61be60 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -699,46 +699,6 @@ (define_expand "lasx_xvrepli" DONE; }) -(define_expand "mov" - [(set (match_operand:LASX 0) - (match_operand:LASX 1))] - "ISA_HAS_LASX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - - -(define_expand "movmisalign" - [(set (match_operand:LASX 0) - (match_operand:LASX 1))] - "ISA_HAS_LASX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - -;; 256-bit LASX modes can only exist in LASX registers or memory. -(define_insn "mov_lasx" - [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f") - (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))] - "ISA_HAS_LASX" - { return loongarch_output_move (operands); } - [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert") - (set_attr "mode" "") - (set_attr "length" "8,4,4,4,4")]) - - -(define_split - [(set (match_operand:LASX 0 "nonimmediate_operand") - (match_operand:LASX 1 "move_operand"))] - "reload_completed && ISA_HAS_LASX - && loongarch_split_move_p (operands[0], operands[1])" - [(const_int 0)] -{ - loongarch_split_move (operands[0], operands[1]); - DONE; -}) ;; LASX (define_insn "add3" diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index ca0066a21ed..bcc5ae85fb3 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -575,42 +575,6 @@ (define_insn "lsx_vshuf_" [(set_attr "type" "simd_sld") (set_attr "mode" "")]) -(define_expand "mov" - [(set (match_operand:LSX 0) - (match_operand:LSX 1))] - "ISA_HAS_LSX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - -(define_expand "movmisalign" - [(set (match_operand:LSX 0) - (match_operand:LSX 1))] - "ISA_HAS_LSX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - -(define_insn "mov_lsx" - [(set (match_operand:LSX 0 "nonimmediate_operand" "=f,f,R,*r,*f,*r") - (match_operand:LSX 1 "move_operand" "fYGYI,R,f,*f,*r,*r"))] - "ISA_HAS_LSX" -{ return loongarch_output_move (operands); } - [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert,simd_copy") - (set_attr "mode" "")]) - -(define_split - [(set (match_operand:LSX 0 "nonimmediate_operand") - (match_operand:LSX 1 "move_operand"))] - "reload_completed && ISA_HAS_LSX - && loongarch_split_move_p (operands[0], operands[1])" - [(const_int 0)] -{ - loongarch_split_move (operands[0], operands[1]); - DONE; -}) ;; Integer operations (define_insn "add3" diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 7605b17d21e..61fc1ab20ad 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -130,6 +130,48 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3") ;; instruction here so we can avoid duplicating logics. ;; === + +;; Move + +;; Some immediate values in V1TI or V2TI may be stored in LSX or LASX +;; registers, thus we need to allow moving them for reload. +(define_mode_iterator ALLVEC_TI [ALLVEC +(V1TI "ISA_HAS_LSX") +(V2TI "ISA_HAS_LASX")]) + +(define_expand "mov" + [(set (match_operand:ALLVEC_TI 0) + (match_operand:ALLVEC_TI 1))] + "" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])) +DONE; +}) + +(define_expand
[PATCH] rx: allow cmpstrnsi len to be zero
The SCMPU instruction doesn't change the C and Z flags when the incoming length is zero, which means the insn will produce a value based upon the existing flag values. As a quick kludge, adjust these flags to ensure a zero result in this case. Signed-off-by: Keith Packard --- gcc/config/rx/rx.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md index 89211585c9c..edb2c96603f 100644 --- a/gcc/config/rx/rx.md +++ b/gcc/config/rx/rx.md @@ -2590,7 +2590,9 @@ (define_insn "rx_cmpstrn" (clobber (reg:SI 3)) (clobber (reg:CC CC_REG))] "rx_allow_string_insns" - "scmpu ; Perform the string comparison + "setpsw z ; Set flags in case len is zero + setpsw c + scmpu ; Perform the string comparison mov #-1, %0 ; Set up -1 result (which cannot be created ; by the SC insn) bnc?+ ; If Carry is not set skip over -- 2.47.2
[PATCH 2/2] i386: Re-alias avx10.2 to 512 bit and deprecate -mno-avx10.2-[256, 512]
As mentioned in avx10.1 option deprecate patch, based on the feedback we got, we would like to re-alias avx10.x to 512 bit. For -mno- options, also mentioned in the previous patch, it is confusing what it is disabling when it comes to avx10. So we will only provide -mno-avx10.x options from AVX10.2, disabling the whole AVX10.x. gcc/ChangeLog: * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_1_UNSET): Adjust macro. (OPTION_MASK_ISA2_AVX10_2_256_UNSET): Removed. (OPTION_MASK_ISA2_AVX10_2_512_UNSET): Ditto. (OPTION_MASK_ISA2_AVX10_2_UNSET): New. (ix86_handle_option): Remove disable part for avx10.2-256. Rename avx10.2-512 switch case to avx10.2 and adjust disable part macro. * common/config/i386/i386-isas.h: Adjust avx10.2 and avx10.2-512. * config/i386/driver-i386.cc (host_detect_local_cpu): Do not append -mno-avx10.x-256 for -march=native. * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Adjust avx10.2 and avx10.2-512. * config/i386/i386.opt: Reject Negative for mavx10.2-256. Alias mavx10.2-512 to mavx10.2. Reject Negative for mavx10.2-512. * doc/extend.texi: Adjust documentation. * doc/sourcebuild.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-vminmaxbf16-2.c: Add missing avx10_2_512 check. * gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto. * gcc.target/i386/avx10-check.h: Change avx10.2 to avx10.2-256. * gcc.target/i386/avx10_2-bf16-1.c: Ditto. * gcc.target/i386/avx10_2-bf16-vector-cmp-1.c: Ditto. * gcc.target/i386/avx10_2-bf16-vector-fma-1.c: Ditto. * gcc.target/i386/avx10_2-bf16-vector-operations-1.c: Ditto. * gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c: Ditto. * gcc.target/i386/avx10_2-builtin-1.c: Ditto. * gcc.target/i386/avx10_2-builtin-2.c: Ditto. * gcc.target/i386/avx10_2-comibf-1.c: Ditto. * gcc.target/i386/avx10_2-comibf-2.c: Ditto. * gcc.target/i386/avx10_2-comibf-3.c: Ditto. * gcc.target/i386/avx10_2-comibf-4.c: Ditto. * gcc.target/i386/avx10_2-compare-1.c: Ditto. * gcc.target/i386/avx10_2-compare-1b.c: Ditto. * gcc.target/i386/avx10_2-convert-1.c: Ditto. * gcc.target/i386/avx10_2-media-1.c: Ditto. * gcc.target/i386/avx10_2-minmax-1.c: Ditto. * gcc.target/i386/avx10_2-movrs-1.c: Ditto. * gcc.target/i386/avx10_2-partial-bf16-vector-fast-math-1.c: Ditto. * gcc.target/i386/avx10_2-partial-bf16-vector-fma-1.c: Ditto. * gcc.target/i386/avx10_2-partial-bf16-vector-operations-1.c: Ditto. * gcc.target/i386/avx10_2-partial-bf16-vector-smaxmin-1.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: Ditto. * gcc.target/i386/avx10_2-rounding-2.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Ditto. * gcc.target/i386/avx10_2-satcvt-1.c: Ditto. * gcc.target/i386/avx10_2-vaddbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcmpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcomisbf16-1.c: Ditto. * gcc.target/i386/avx10_2-vcomisbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttpd2dqs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttpd2qqs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttpd2udqs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto. * gcc.targe
[PING 2] [PATCH v2] rs6000: Inefficient vector splat of small V2DI constants [PR107757]
Ping. I have incorporated review comments from Peter in this revised patch. The comment was to remove -mvsx option from dg-options as this is implied by -mcpu=power8. Ok for trunk? Regards, Surya On 09/01/25 8:53 pm, Surya Kumari Jangala wrote: > Ping > > On 02/12/24 2:20 pm, Surya Kumari Jangala wrote: >> I have incorporated review comments in this patch. >> >> Regards, >> Surya >> >> >> rs6000: Inefficient vector splat of small V2DI constants [PR107757] >> >> On P8, for vector splat of double word constants, specifically -1 and 1, >> gcc generates inefficient code. For -1, gcc generates two instructions >> (vspltisw and vupkhsw) whereas only one instruction (vspltisw) is >> sufficient. For constant 1, gcc generates a load of the constant from >> .rodata instead of the instructions vspltisw and vupkhsw. >> >> The routine vspltisw_vupkhsw_constant_p() returns true if the constant >> can be synthesized with instructions vspltisw and vupkhsw. However, for >> constant 1, this routine returns false. >> >> For constant -1, this routine returns true. Vector splat of -1 can be >> done with only one instruction, i.e., vspltisw. We do not need two >> instructions. Hence this routine should return false for -1. >> >> With this patch, gcc generates only one instruction (vspltisw) >> for -1. And for constant 1, this patch generates two instructions >> (vspltisw and vupkhsw). >> >> 2024-11-20 Surya Kumari Jangala >> >> gcc/ >> PR target/107757 >> * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): >> Return false for -1 and return true for 1. >> >> gcc/testsuite/ >> PR target/107757 >> * gcc.target/powerpc/pr107757-1.c: New. >> * gcc.target/powerpc/pr107757-2.c: New. >> --- >> gcc/config/rs6000/rs6000.cc | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr107757-1.c | 14 ++ >> gcc/testsuite/gcc.target/powerpc/pr107757-2.c | 13 + >> 3 files changed, 28 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107757-1.c >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107757-2.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index 02a2f1152db..d0c528f4d5f 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -6652,7 +6652,7 @@ vspltisw_vupkhsw_constant_p (rtx op, machine_mode >> mode, int *constant_ptr) >> return false; >> >>value = INTVAL (elt); >> - if (value == 0 || value == 1 >> + if (value == 0 || value == -1 >>|| !EASY_VECTOR_15 (value)) >> return false; >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-1.c >> b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c >> new file mode 100644 >> index 000..49076fba255 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c >> @@ -0,0 +1,14 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ >> +/* { dg-require-effective-target powerpc_vsx } */ >> +/* { dg-final { scan-assembler {\mvspltisw\M} } } */ >> +/* { dg-final { scan-assembler {\mvupkhsw\M} } } */ >> +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ >> + >> +#include >> + >> +vector long long >> +foo () >> +{ >> + return vec_splats (1LL); >> +} >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-2.c >> b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c >> new file mode 100644 >> index 000..4955696f11d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ >> +/* { dg-require-effective-target powerpc_vsx } */ >> +/* { dg-final { scan-assembler {\mvspltisw\M} } } */ >> +/* { dg-final { scan-assembler-not {\mvupkhsw\M} } } */ >> + >> +#include >> + >> +vector long long >> +foo () >> +{ >> + return vec_splats (~0LL); >> +} >
[PATCH v2] x86: Properly find the maximum stack slot alignment
Don't assume that stack slots can only be accessed by stack or frame registers. We first find all registers defined by stack or frame registers. Then check memory accesses by such registers, including stack and frame registers. gcc/ PR target/109780 PR target/109093 * config/i386/i386.cc (ix86_update_stack_alignment): New. (ix86_find_all_reg_use_1): Likewise. (ix86_find_all_reg_use): Likewise. (ix86_find_max_used_stack_alignment): Also check memory accesses from registers defined by stack or frame registers. gcc/testsuite/ PR target/109780 PR target/109093 * g++.target/i386/pr109780-1.C: New test. * gcc.target/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. * gcc.target/i386/pr109780-3.c: Likewise. -- H.J. From 820f939a024fc71e4e37b509a3aa0290e8c4e9df Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 14 Mar 2023 11:41:51 -0700 Subject: [PATCH v2] x86: Properly find the maximum stack slot alignment Don't assume that stack slots can only be accessed by stack or frame registers. We first find all registers defined by stack or frame registers. Then check memory accesses by such registers, including stack and frame registers. gcc/ PR target/109780 PR target/109093 * config/i386/i386.cc (ix86_update_stack_alignment): New. (ix86_find_all_reg_use_1): Likewise. (ix86_find_all_reg_use): Likewise. (ix86_find_max_used_stack_alignment): Also check memory accesses from registers defined by stack or frame registers. gcc/testsuite/ PR target/109780 PR target/109093 * g++.target/i386/pr109780-1.C: New test. * gcc.target/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. * gcc.target/i386/pr109780-3.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc| 173 ++--- gcc/testsuite/g++.target/i386/pr109780-1.C | 72 + gcc/testsuite/gcc.target/i386/pr109093-1.c | 39 + gcc/testsuite/gcc.target/i386/pr109780-1.c | 14 ++ gcc/testsuite/gcc.target/i386/pr109780-2.c | 21 +++ gcc/testsuite/gcc.target/i386/pr109780-3.c | 52 +++ 6 files changed, 350 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C create mode 100644 gcc/testsuite/gcc.target/i386/pr109093-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-3.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3128973ba79..4d855d9541c 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -8466,6 +8466,110 @@ output_probe_stack_range (rtx reg, rtx end) return ""; } +/* Update the maximum stack slot alignment from memory alignment in + PAT. */ + +static void +ix86_update_stack_alignment (rtx, const_rtx pat, void *data) +{ + /* This insn may reference stack slot. Update the maximum stack slot + alignment. */ + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, pat, ALL) +if (MEM_P (*iter)) + { + unsigned int alignment = MEM_ALIGN (*iter); + unsigned int *stack_alignment + = (unsigned int *) data; + if (alignment > *stack_alignment) + *stack_alignment = alignment; + break; + } +} + +/* Helper function for ix86_find_all_reg_use. */ + +static void +ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access, + auto_bitmap &worklist) +{ + rtx src = SET_SRC (set); + if (MEM_P (src)) +return; + + rtx dest = SET_DEST (set); + if (!REG_P (dest)) +return; + + if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest))) +return; + + /* Add this register to stack_slot_access. */ + add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest)); + bitmap_set_bit (worklist, REGNO (dest)); +} + +/* Find all registers defined with REG. */ + +static void +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access, + unsigned int reg, auto_bitmap &worklist) +{ + for (df_ref ref = DF_REG_USE_CHAIN (reg); + ref != NULL; + ref = DF_REF_NEXT_REG (ref)) +{ + if (DF_REF_IS_ARTIFICIAL (ref)) + continue; + + rtx_insn *insn = DF_REF_INSN (ref); + if (!NONDEBUG_INSN_P (insn)) + continue; + + if (CALL_P (insn) || JUMP_P (insn)) + continue; + + rtx set = single_set (insn); + if (set) + ix86_find_all_reg_use_1 (set, stack_slot_access, worklist); + + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != PARALLEL) + continue; + + for (int i = 0; i < XVECLEN (pat, 0); i++) + { + rtx exp = XVECEXP (pat, 0, i); + switch (GET_CODE (exp)) + { + case ASM_OPERANDS: + case CLOBBER: + case PREFETCH: + case USE: + break; + case UNSPEC: + case UNSPEC_VOLATILE: + for (int j = XVECLEN (exp, 0) - 1; j >= 0; j--) + { + rtx x = XVECEXP (exp, 0, j); + if (GET_CODE (x) == SET) + ix86_find_all_reg_use_1 (x,
[PATCH v2] LoongArch: Adjust the cost of ADDRESS_REG_REG.
After changing this cost from 1 to 3, the performance of spec2006 401 473 416 465 482 can be improved by about 2% on LA664. Add option '-maddr-reg-reg-cost='. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Add option '-maddr-reg-reg-cost='. * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize addr_reg_reg_cost to 3. * config/loongarch/loongarch-opts.cc (loongarch_target_option_override): If '-maddr-reg-reg-cost=' is not used, set it to the initial value. * config/loongarch/loongarch-tune.h (struct loongarch_rtx_cost_data): Add the member addr_reg_reg_cost and its assignment function to the structure loongarch_rtx_cost_data. * config/loongarch/loongarch.cc (loongarch_address_insns): Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.opt.urls: Regenerate. * doc/invoke.texi: Add description of '-maddr-reg-reg-cost='. gcc/testsuite/ChangeLog: * gcc.target/loongarch/const-double-zero-stx.c: Add '-maddr-reg-reg-cost=1'. * gcc.target/loongarch/stack-check-alloca-1.c: Likewise. Change-Id: I8fbf7a6d073b16c7829b1a9a8d239b131d53ab1b --- gcc/config/loongarch/genopts/loongarch.opt.in | 4 gcc/config/loongarch/loongarch-def.cc | 1 + gcc/config/loongarch/loongarch-opts.cc | 3 +++ gcc/config/loongarch/loongarch-tune.h | 7 +++ gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.opt | 4 gcc/config/loongarch/loongarch.opt.urls| 3 +++ gcc/doc/invoke.texi| 7 ++- gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c | 2 +- gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c | 2 +- 10 files changed, 31 insertions(+), 4 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 8c292c8600d..39c1545e540 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -177,6 +177,10 @@ mbranch-cost= Target RejectNegative Joined UInteger Var(la_branch_cost) Save -mbranch-cost=COST Set the cost of branches to roughly COST instructions. +maddr-reg-reg-cost= +Target RejectNegative Joined UInteger Var(la_addr_reg_reg_cost) Save +-maddr-reg-reg-cost=COST Set the cost of ADDRESS_REG_REG to the value calculated by COST. + mcheck-zero-division Target Mask(CHECK_ZERO_DIV) Save Trap on integer divide by zero. diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index b0271eb3b9a..5f235a04ef2 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -136,6 +136,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data () movcf2gr (COSTS_N_INSNS (7)), movgr2cf (COSTS_N_INSNS (15)), branch_cost (6), +addr_reg_reg_cost (3), memory_latency (4) {} /* The following properties cannot be looked up directly using "cpucfg". diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index 36342cc9373..c2a63f75fc2 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -1010,6 +1010,9 @@ loongarch_target_option_override (struct loongarch_target *target, if (!opts_set->x_la_branch_cost) opts->x_la_branch_cost = loongarch_cost->branch_cost; + if (!opts_set->x_la_addr_reg_reg_cost) +opts->x_la_addr_reg_reg_cost = loongarch_cost->addr_reg_reg_cost; + /* other stuff */ if (ABI_LP64_P (target->abi.base)) opts->x_flag_pcc_struct_return = 0; diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h index e69173ebf79..f7819fe7678 100644 --- a/gcc/config/loongarch/loongarch-tune.h +++ b/gcc/config/loongarch/loongarch-tune.h @@ -38,6 +38,7 @@ struct loongarch_rtx_cost_data unsigned short movcf2gr; unsigned short movgr2cf; unsigned short branch_cost; + unsigned short addr_reg_reg_cost; unsigned short memory_latency; /* Default RTX cost initializer, implemented in loongarch-def.cc. */ @@ -115,6 +116,12 @@ struct loongarch_rtx_cost_data return *this; } + loongarch_rtx_cost_data addr_reg_reg_cost_ (unsigned short _addr_reg_reg_cost) + { +addr_reg_reg_cost = _addr_reg_reg_cost; +return *this; + } + loongarch_rtx_cost_data memory_latency_ (unsigned short _memory_latency) { memory_latency = _memory_latency; diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index e9978370e8c..495b62309d6 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -238
Re: [PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]
On Thu, Feb 13, 2025 at 12:48:44PM +0100, Richard Biener wrote: > So what this basically does is ensure we mark DECL_VALUE_EXPR when > VAR is marked which isn't done when marking a tree node. > > That you special-case the hashtable walker is a workaround for > us not being able to say > > struct GTY((mark_extra_stuff)) tree_decl_with_vis { > > on 'tree' (or specifically the structs for a VAR_DECL). And that we > rely on gengtype producing the 'tree' marker. So we rely on the > hashtable keeping referenced trees live. Yes, we could just arrange for gt_ggc_mx_lang_tree_node to additionally mark DECL_VALUE_EXPR for VAR_DECLs with DECL_HAS_VALUE_EXPR_P set (dunno how exactly). I think what the patch does should be slightly cheaper, we avoid those DECL_VALUE_EXPR hash table lookups in the common case where DECL_VALUE_EXPR of marked variables just refers to trees which reference only marked VAR_DECLs and no unmarked ones. Jakub
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 14:17, Robin Dapp wrote: Other thoughts? >>> The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff >>> we can't/don't model in the pipeline, but I have no idea how to model >>> the VL=0 case there. >> Maybe so, but what Edwin is doing looks sensible enough. It wouldn't be >> the first time a hook got (ab)used in ways that weren't part of the >> original intent. > I don't fully understand what's happening. So the hoisting is being done > speculatively here? And it just happens to be "bad" because that might > cause a VL=0 case. But are we sure a lack of speculation cannot cause > such cases? Exactly. My gut feeling w/o deep dive was this seemed like papering over the issue. BTW what exactly is speculative scheduling ? As in what is it actually trying to schedule ahead ? > Also, why doesn't the vsetvl pass fix the situation? IMHO we need to > understand the problem more thoroughly before changing things. > In the end LCM minimizes the number of vsetvls and inserts them at the > "earliest" point. If that is not sufficient I'd say we need modify > the constraints (maybe on a per-uarch basis)? As far as LCM is concerned it is hoisting the insn to the optimal spot. However there's some additional logic such as in can_use_next_avl_p () which influences if things can be moved around. > On a separate note: How about we move the vsetvl pass after sched2? > Then we could at least rely on LCM doing its work uninhibited and wouldn't > reorder vsetvls afterwards. Bingo ! excellent idea. This would ensure scheduling doesn't undo carefully placed stuff, but > Or do we somehow rely on rtl_dce and BB > reorder to run afterwards? ... I have no idea if any of this is in play. > That won't help with the problem here but might with others. Right this needs to be evaluated independently with both icounts and BPI3 runs to see if anything falls out. -Vineet
[PATCH v3 2/4] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.
Split the implementation of the function loongarch_cpu_cpp_builtins into two parts: 1. Macro definitions that do not change (only considering 64-bit architecture) 2. Macro definitions that change with different compilation options. gcc/ChangeLog: * config/loongarch/loongarch-c.cc (builtin_undef): New macro. (loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins and loongarch_define_unconditional_macros. (loongarch_def_or_undef): New functions. (loongarch_define_unconditional_macros): Likewise. (loongarch_update_cpp_builtins): Likewise. --- gcc/config/loongarch/loongarch-c.cc | 122 ++-- 1 file changed, 77 insertions(+), 45 deletions(-) diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index 5d8c02e094b..9a8de1ec381 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -31,26 +31,22 @@ along with GCC; see the file COPYING3. If not see #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM) #define builtin_define(TXT) cpp_define (pfile, TXT) +#define builtin_undef(TXT) cpp_undef (pfile, TXT) #define builtin_assert(TXT) cpp_assert (pfile, TXT) -void -loongarch_cpu_cpp_builtins (cpp_reader *pfile) +static void +loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader *pfile) { - builtin_assert ("machine=loongarch"); - builtin_assert ("cpu=loongarch"); - builtin_define ("__loongarch__"); - - builtin_define_with_value ("__loongarch_arch", -loongarch_arch_strings[la_target.cpu_arch], 1); - - builtin_define_with_value ("__loongarch_tune", -loongarch_tune_strings[la_target.cpu_tune], 1); - - builtin_define_with_value ("_LOONGARCH_ARCH", -loongarch_arch_strings[la_target.cpu_arch], 1); + if (def_p) +cpp_define (pfile, macro); + else +cpp_undef (pfile, macro); +} - builtin_define_with_value ("_LOONGARCH_TUNE", -loongarch_tune_strings[la_target.cpu_tune], 1); +static void +loongarch_define_unconditional_macros (cpp_reader *pfile) +{ + builtin_define ("__loongarch__"); /* Base architecture / ABI. */ if (TARGET_64BIT) @@ -66,6 +62,48 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) builtin_define ("__loongarch_lp64"); } + /* Add support for FLOAT128_TYPE on the LoongArch architecture. */ + builtin_define ("__FLOAT128_TYPE__"); + + /* Map the old _Float128 'q' builtins into the new 'f128' builtins. */ + builtin_define ("__builtin_fabsq=__builtin_fabsf128"); + builtin_define ("__builtin_copysignq=__builtin_copysignf128"); + builtin_define ("__builtin_nanq=__builtin_nanf128"); + builtin_define ("__builtin_nansq=__builtin_nansf128"); + builtin_define ("__builtin_infq=__builtin_inff128"); + builtin_define ("__builtin_huge_valq=__builtin_huge_valf128"); + + /* Native Data Sizes. */ + builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE); + builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE); + builtin_define_with_int_value ("_LOONGARCH_SZPTR", POINTER_SIZE); + builtin_define_with_int_value ("_LOONGARCH_FPSET", 32); + builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32); +} + +static void +loongarch_update_cpp_builtins (cpp_reader *pfile) +{ + /* Since the macros in this function might be redefined, it's necessary to + undef them first.*/ + builtin_undef ("__loongarch_arch"); + builtin_define_with_value ("__loongarch_arch", +loongarch_arch_strings[la_target.cpu_arch], 1); + + builtin_undef ("__loongarch_tune"); + builtin_define_with_value ("__loongarch_tune", +loongarch_tune_strings[la_target.cpu_tune], 1); + + builtin_undef ("_LOONGARCH_ARCH"); + builtin_define_with_value ("_LOONGARCH_ARCH", +loongarch_arch_strings[la_target.cpu_arch], 1); + + builtin_undef ("_LOONGARCH_TUNE"); + builtin_define_with_value ("_LOONGARCH_TUNE", +loongarch_tune_strings[la_target.cpu_tune], 1); + + builtin_undef ("__loongarch_double_float"); + builtin_undef ("__loongarch_single_float"); /* These defines reflect the ABI in use, not whether the FPU is directly accessible. */ if (TARGET_DOUBLE_FLOAT_ABI) @@ -73,6 +111,8 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) else if (TARGET_SINGLE_FLOAT_ABI) builtin_define ("__loongarch_single_float=1"); + builtin_undef ("__loongarch_soft_float"); + builtin_undef ("__loongarch_hard_float"); if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI) builtin_define ("__loongarch_hard_float=1"); else @@ -80,6 +120,7 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) /* ISA Extensions. */ + builtin_undef ("__loongarch_frlen"); if (TARGET_DOUBLE_FLOAT) builtin_define ("__loongarch_frlen=64"); else if (TARGET_SINGLE_FLOAT) @@ -87
[PATCH v3 0/4] Organize the code and fix PR118828 and PR118843.
v1 -> v2: 1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE} __loongarch_{div32,am_bh,amcas,ld_seq_sa} and __loongarch_version_major/__loongarch_version_minor to update function. 2. Fixed PR118843. 3. Add testsuites. v2 -> v3: 1. Modify test cases (pr118828-3.c pr118828-4.c). Lulu Cheng (4): LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc. LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions. LoongArch: After setting the compilation options, update the predefined macros. LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843]. gcc/config/loongarch/loongarch-c.cc | 204 +- gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch-target-attr.cc | 48 - .../gcc.target/loongarch/pr118828-2.c | 30 +++ .../gcc.target/loongarch/pr118828-3.c | 32 +++ .../gcc.target/loongarch/pr118828-4.c | 32 +++ gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 +++ gcc/testsuite/gcc.target/loongarch/pr118843.c | 6 + 8 files changed, 287 insertions(+), 100 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c -- 2.34.1
[PATCH v3 3/4] LoongArch: After setting the compilation options, update the predefined macros.
PR target/118828 gcc/ChangeLog: * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): Update the predefined macros. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118828.c: New test. * gcc.target/loongarch/pr118828-2.c: New test. * gcc.target/loongarch/pr118828-3.c: New test. * gcc.target/loongarch/pr118828-4.c: New test. --- gcc/config/loongarch/loongarch-c.cc | 14 .../gcc.target/loongarch/pr118828-2.c | 30 .../gcc.target/loongarch/pr118828-3.c | 32 + .../gcc.target/loongarch/pr118828-4.c | 32 + gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 +++ 5 files changed, 142 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index 9a8de1ec381..66ae77ad665 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see #include "tm.h" #include "c-family/c-common.h" #include "cpplib.h" +#include "c-family/c-pragma.h" #include "tm_p.h" #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM) @@ -212,6 +213,19 @@ loongarch_pragma_target_parse (tree args, tree pop_target) loongarch_reset_previous_fndecl (); + /* For the definitions, ensure all newly defined macros are considered + as used for -Wunused-macros. There is no point warning about the + compiler predefined macros. */ + cpp_options *cpp_opts = cpp_get_options (parse_in); + unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros; + cpp_opts->warn_unused_macros = 0; + + cpp_force_token_locations (parse_in, BUILTINS_LOCATION); + loongarch_update_cpp_builtins (parse_in); + cpp_stop_forcing_token_locations (parse_in); + + cpp_opts->warn_unused_macros = saved_warn_unused_macros; + /* If we're popping or reseting make sure to update the globals so that the optab availability predicates get recomputed. */ if (pop_target) diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-2.c b/gcc/testsuite/gcc.target/loongarch/pr118828-2.c new file mode 100644 index 000..3d32fcc15c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr118828-2.c @@ -0,0 +1,30 @@ +/* { dg-do preprocess } */ +/* { dg-options "-mno-lsx" } */ + +#ifdef __loongarch_sx +#error LSX should not be available here +#endif + +#ifdef __loongarch_simd_width +#error simd width shuold not be available here +#endif + +#pragma GCC push_options +#pragma GCC target("lsx") +#ifndef __loongarch_sx +#error LSX should be available here +#endif +#ifndef __loongarch_simd_width +#error simd width should be available here +#elif __loongarch_simd_width != 128 +#error simd width should be 128 +#endif +#pragma GCC pop_options + +#ifdef __loongarch_sx +#error LSX should become unavailable again +#endif + +#ifdef __loongarch_simd_width +#error simd width shuold become unavailable again +#endif diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c new file mode 100644 index 000..31ab8e59a3f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64" } */ +/* { dg-final { scan-assembler "t1: loongarch64" } } */ +/* { dg-final { scan-assembler "t2: la64v1.1" } } */ +/* { dg-final { scan-assembler "t3: loongarch64" } } */ + +#ifndef __loongarch_arch +#error __loongarch_arch should be available here +#endif + +void +t1 (void) +{ + asm volatile ("# t1: " __loongarch_arch); +} + +#pragma GCC push_options +#pragma GCC target("arch=la64v1.1") + +void +t2 (void) +{ + asm volatile ("# t2: " __loongarch_arch); +} + +#pragma GCC pop_options + +void +t3 (void) +{ + asm volatile ("# t3: " __loongarch_arch); +} diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-4.c b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c new file mode 100644 index 000..77587ee5614 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64 -mtune=la464" } */ +/* { dg-final { scan-assembler "t1: la464" } } */ +/* { dg-final { scan-assembler "t2: la664" } } */ +/* { dg-final { scan-assembler "t3: la464" } } */ + +#ifndef __loongarch_tune +#error __loongarch_tune should be available here +#endif + +void +t1 (void) +{ + asm volatile ("# t1: " __loongarch_tune); +} + +#pragma GCC push_options +#pragma GCC target("tune=la664") + +void +t2 (void) +{ + asm volatile ("# t2: " __loongarch_tune); +} + +#pragma GCC pop_options
[PATCH v3 4/4] LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843].
PR target/118843 gcc/ChangeLog: * config/loongarch/loongarch-c.cc (loongarch_update_cpp_builtins): Fix macro definition issues. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118843.c: New test. --- gcc/config/loongarch/loongarch-c.cc | 27 ++- gcc/testsuite/gcc.target/loongarch/pr118843.c | 6 + 2 files changed, 21 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index 66ae77ad665..effdcf0e255 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -129,9 +129,6 @@ loongarch_update_cpp_builtins (cpp_reader *pfile) else builtin_define ("__loongarch_frlen=0"); - loongarch_def_or_undef (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE, - "__loongarch_frecipe", pfile); - loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_simd", pfile); loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_sx", pfile); loongarch_def_or_undef (ISA_HAS_LASX, "__loongarch_asx", pfile); @@ -149,17 +146,23 @@ loongarch_update_cpp_builtins (cpp_reader *pfile) int max_v_major = 1, max_v_minor = 0; for (int i = 0; i < N_EVO_FEATURES; i++) -if (la_target.isa.evolution & la_evo_feature_masks[i]) - { - builtin_define (la_evo_macro_name[i]); +{ + builtin_undef (la_evo_macro_name[i]); - int major = la_evo_version_major[i], - minor = la_evo_version_minor[i]; + if (la_target.isa.evolution & la_evo_feature_masks[i] + && (la_evo_feature_masks[i] != OPTION_MASK_ISA_FRECIPE + || TARGET_HARD_FLOAT)) + { + builtin_define (la_evo_macro_name[i]); - max_v_major = major > max_v_major ? major : max_v_major; - max_v_minor = major == max_v_major - ? (minor > max_v_minor ? minor : max_v_minor) : max_v_minor; - } + int major = la_evo_version_major[i], + minor = la_evo_version_minor[i]; + + max_v_major = major > max_v_major ? major : max_v_major; + max_v_minor = major == max_v_major + ? (minor > max_v_minor ? minor : max_v_minor) : max_v_minor; + } +} /* Find the minimum ISA version required to run the target program. */ builtin_undef ("__loongarch_version_major"); diff --git a/gcc/testsuite/gcc.target/loongarch/pr118843.c b/gcc/testsuite/gcc.target/loongarch/pr118843.c new file mode 100644 index 000..30372b8ffe6 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr118843.c @@ -0,0 +1,6 @@ +/* { dg-do preprocess } */ +/* { dg-options "-mfrecipe -mfpu=none" } */ + +#ifdef __loongarch_frecipe +#error __loongarch_frecipe should not be avaliable here +#endif -- 2.34.1
[PATCH v3 1/4] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.
gcc/ChangeLog: * config/loongarch/loongarch-target-attr.cc (loongarch_pragma_target_parse): Move to ... (loongarch_register_pragmas): Move to ... * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): ... here. (loongarch_register_pragmas): ... here. * config/loongarch/loongarch-protos.h (loongarch_process_target_attr): Function Declaration. --- gcc/config/loongarch/loongarch-c.cc | 51 +++ gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch-target-attr.cc | 48 - 3 files changed, 52 insertions(+), 48 deletions(-) diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index c95c0f373be..5d8c02e094b 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -23,9 +23,11 @@ along with GCC; see the file COPYING3. If not see #include "config.h" #include "system.h" #include "coretypes.h" +#include "target.h" #include "tm.h" #include "c-family/c-common.h" #include "cpplib.h" +#include "tm_p.h" #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM) #define builtin_define(TXT) cpp_define (pfile, TXT) @@ -145,3 +147,52 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32); } + +/* Hook to validate the current #pragma GCC target and set the state, and + update the macros based on what was changed. If ARGS is NULL, then + POP_TARGET is used to reset the options. */ + +static bool +loongarch_pragma_target_parse (tree args, tree pop_target) +{ + /* If args is not NULL then process it and setup the target-specific + information that it specifies. */ + if (args) +{ + if (!loongarch_process_target_attr (args, NULL)) + return false; + + loongarch_option_override_internal (&la_target, + &global_options, + &global_options_set); +} + + /* args is NULL, restore to the state described in pop_target. */ + else +{ + pop_target = pop_target ? pop_target : target_option_default_node; + cl_target_option_restore (&global_options, &global_options_set, + TREE_TARGET_OPTION (pop_target)); +} + + target_option_current_node += build_target_option_node (&global_options, &global_options_set); + + loongarch_reset_previous_fndecl (); + + /* If we're popping or reseting make sure to update the globals so that + the optab availability predicates get recomputed. */ + if (pop_target) +loongarch_save_restore_target_globals (pop_target); + + return true; +} + +/* Implement REGISTER_TARGET_PRAGMAS. */ + +void +loongarch_register_pragmas (void) +{ + /* Update pragma hook to allow parsing #pragma GCC target. */ + targetm.target_option.pragma_parse = loongarch_pragma_target_parse; +} diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index b99f949a004..e7b318143bf 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -219,4 +219,5 @@ extern void loongarch_option_override_internal (struct loongarch_target *, struc extern void loongarch_reset_previous_fndecl (void); extern void loongarch_save_restore_target_globals (tree new_tree); extern void loongarch_register_pragmas (void); +extern bool loongarch_process_target_attr (tree args, tree fndecl); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch-target-attr.cc b/gcc/config/loongarch/loongarch-target-attr.cc index cee7031ca1e..cb537446dff 100644 --- a/gcc/config/loongarch/loongarch-target-attr.cc +++ b/gcc/config/loongarch/loongarch-target-attr.cc @@ -422,51 +422,3 @@ loongarch_option_valid_attribute_p (tree fndecl, tree, tree args, int) return ret; } -/* Hook to validate the current #pragma GCC target and set the state, and - update the macros based on what was changed. If ARGS is NULL, then - POP_TARGET is used to reset the options. */ - -static bool -loongarch_pragma_target_parse (tree args, tree pop_target) -{ - /* If args is not NULL then process it and setup the target-specific - information that it specifies. */ - if (args) -{ - if (!loongarch_process_target_attr (args, NULL)) - return false; - - loongarch_option_override_internal (&la_target, - &global_options, - &global_options_set); -} - - /* args is NULL, restore to the state described in pop_target. */ - else -{ - pop_target = pop_target ? pop_target : target_option_default_node; - cl_target_option_restore (&global_options, &global_options_set, - TREE_TARGET_OPTION (pop_target)); -} - - target_option_current_node -= build_target_option_node (&globa
Re: [PATCH htdocs] bugs: mention ASAN too
Gerald Pfeifer writes: > On Mon, 11 Nov 2024, Sam James wrote: >> Request that reporters try `-fsanitize=address,undefined` rather than >> just `-fsanitize=undefined` when reporting bugs. We get invalid bug >> reports which ASAN would've caught sometimes, even if it's less often >> than where UBSAN would help. > > I don't have a strong opinion on this and would prefer someone else to > chime in. That said, if we don't hear from someone else by early next > week, please go ahead and push. Done now - sorry, this had slipped my mind. > > > Just one (naive) question: Are there instances where -fsanitize=undefined > may be available/working where -fsanitize=address,undefined may be not? > > If so, perhaps provide both invocations as in >-fsanitize=undefined or -fsanitize=address,un... > ? > > Your call; just a thought. > It's a good question - AFAIK there aren't any such cases. It is possible, but rather remote, that the instrumentation from one *but not* the other inhibits a compiler bug in some cases (or just user UB). I can include both if you think that's worth doing, but I tend to think it'll make the text too verbose. > Gerald > > >> htdocs/bugs/index.html | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html >> index c7d2f310..d6556b26 100644 >> --- a/htdocs/bugs/index.html >> +++ b/htdocs/bugs/index.html >> @@ -52,7 +52,7 @@ try a current release or development snapshot. >> with gcc -Wall -Wextra and see whether this shows anything >> wrong with your code. Similarly, if compiling with >> -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations >> -makes a difference, or if compiling with -fsanitize=undefined >> +makes a difference, or if compiling with >> -fsanitize=address,undefined >> produces any run-time errors, then your code is probably not correct. >>
Re: [PATCH 0/2] x86: Add a pass to fold tail call
On Thu, Feb 13, 2025 at 1:58 AM H.J. Lu wrote: > > x86 conditional branch (jcc) target can be either a label or a symbol. > Add a pass to fold tail call with jcc by turning: > > jcc .L6 > ... > .L6: > jmp tailcall > > into: > > jcc tailcall > > After basic block reordering pass, conditional branches look like > > (jump_insn 7 6 14 2 (set (pc) > (if_then_else (eq (reg:CCZ 17 flags) > (const_int 0 [0])) > (label_ref:DI 23) > (pc))) "x.c":8:5 1458 {jcc} > (expr_list:REG_DEAD (reg:CCZ 17 flags) > (int_list:REG_BR_PROB 217325348 (nil))) > ... > (code_label 23 20 8 4 4 (nil) [1 uses]) > (note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41] > on_decl 0x7f4cff3c0b00 bar>) [0 bar S1 A8]) > (const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di} > (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41] > l 0x7f4cff3c0b00 bar>) > (nil)) > (nil)) > > If the branch edge destination is a basic block with only a direct > sibcall, change the jcc target to the sibcall target and decrement > the destination basic block entry label use count. Even though the > destination basic block is unused, it must be kept since it is required > by RTL control flow check and JUMP_LABEL of the conditional jump can > only point to a code label, not a code symbol. Dummy sibcall patterns > are added so that sibcalls in basic blocks, whose entry label use count > is 0, won't be generated. This reads like you are trying to get around some checks in RTL control flow. So, either changes you are performing to RTX stream are not allowed (these checks are here for a reason), or the infrastructure is not (yet) prepared to handle this functionality. Either way, please discuss with infrastructure maintainers (CC'd) first if the approach is correct and if these changes to RTX stream are allowed by the infra. Thanks, Uros. > > Jump tables like > > foo: > .cfi_startproc > cmpl$4, %edi > ja .L1 > movl%edi, %edi > jmp *.L4(,%rdi,8) > .section.rodata > .L4: > .quad .L8 > .quad .L7 > .quad .L6 > .quad .L5 > .quad .L3 > .text > .L5: > jmp bar3 > .L3: > jmp bar4 > .L8: > jmp bar0 > .L7: > jmp bar1 > .L6: > jmp bar2 > .L1: > ret > .cfi_endproc > > can also be changed to: > > foo: > .cfi_startproc > cmpl$4, %edi > ja .L1 > movl%edi, %edi > jmp *.L4(,%rdi,8) > .section.rodata > .L4: > .quad bar0 > .quad bar1 > .quad bar2 > .quad bar3 > .quad bar4 > .text > .L1: > ret > .cfi_endproc > > After basic block reordering pass, jump tables look like: > > (jump_table_data 16 15 17 (addr_vec:DI [ > (label_ref:DI 18) > (label_ref:DI 22) > (label_ref:DI 26) > (label_ref:DI 30) > (label_ref:DI 34) > ])) > ... > (code_label 30 17 31 4 5 (nil) [1 uses]) > (note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41] > ) [0 bar3 S1 A8]) > (const_int 0 [0])) "j.c":15:13 1469 {sibcall_di} > (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41] > ) > (nil)) > (nil)) > > If the jump table entry points to a target basic block with only a direct > sibcall, change the entry to point to the sibcall target and decrement > the target basic block entry label use count. If the target basic block > isn't kept for JUMP_LABEL of the conditional tailcall, delete it if its > entry label use count is 0. > > Update final_scan_insn_1 to skip a label if its use count is 0 and > support symbol reference in jump table. Update create_trace_edges to > skip symbol reference in jump table. > > H.J. Lu (2): > x86: Add a pass to fold tail call > x86: Fold sibcall targets into jump table > > gcc/config/i386/i386-features.cc | 274 + > gcc/config/i386/i386-passes.def| 1 + > gcc/config/i386/i386-protos.h | 3 + > gcc/config/i386/i386.cc| 12 + > gcc/config/i386/i386.md| 57 - > gcc/config/i386/predicates.md | 4 + > gcc/dwarf2cfi.cc | 7 +- > gcc/final.cc | 26 +- > gcc/testsuite/gcc.target/i386/pr14721-1a.c | 54 > gcc/testsuite/gcc.target/i386/pr14721-1b.c | 37 +++ > gcc/testsuite/gcc.target/i386/pr14721-1c.c | 37 +++ > gcc/testsuite/gcc.target/i386/pr14721-2a.c | 58 + > gcc/testsuite/gcc.target/i386/pr14721-2b.c | 41 +++ > gcc/testsuite/gcc.target/i386/pr14721-2c.c | 43 > gcc/testsuite/
[wwwdocs,applied] Mention -mno-call-main
Applied the following avr news to gcc-15: diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 7638d3d5..41425257 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -500,6 +500,10 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;" >-msplit-ldst and href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#index-msplit-bit-shift"; >-msplit-bit-shift. + Support has been added for the new option +href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#index-mno-call-main"; + >-mno-call-main. Instead of calling main, +it will be located in section .init9. IA-32/x86-64 Johann
Re: [PATCH 0/2] x86: Add a pass to fold tail call
On Thu, Feb 13, 2025 at 5:31 PM Uros Bizjak wrote: > > On Thu, Feb 13, 2025 at 1:58 AM H.J. Lu wrote: > > > > x86 conditional branch (jcc) target can be either a label or a symbol. > > Add a pass to fold tail call with jcc by turning: > > > > jcc .L6 > > ... > > .L6: > > jmp tailcall > > > > into: > > > > jcc tailcall > > > > After basic block reordering pass, conditional branches look like > > > > (jump_insn 7 6 14 2 (set (pc) > > (if_then_else (eq (reg:CCZ 17 flags) > > (const_int 0 [0])) > > (label_ref:DI 23) > > (pc))) "x.c":8:5 1458 {jcc} > > (expr_list:REG_DEAD (reg:CCZ 17 flags) > > (int_list:REG_BR_PROB 217325348 (nil))) > > ... > > (code_label 23 20 8 4 4 (nil) [1 uses]) > > (note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > > (call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41] > > > on_decl 0x7f4cff3c0b00 bar>) [0 bar S1 A8]) > > (const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di} > > (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41] > > > l 0x7f4cff3c0b00 bar>) > > (nil)) > > (nil)) > > > > If the branch edge destination is a basic block with only a direct > > sibcall, change the jcc target to the sibcall target and decrement > > the destination basic block entry label use count. Even though the > > destination basic block is unused, it must be kept since it is required > > by RTL control flow check and JUMP_LABEL of the conditional jump can > > only point to a code label, not a code symbol. Dummy sibcall patterns > > are added so that sibcalls in basic blocks, whose entry label use count > > is 0, won't be generated. > > This reads like you are trying to get around some checks in RTL > control flow. So, either changes you are performing to RTX stream are > not allowed (these checks are here for a reason), or the > infrastructure is not (yet) prepared to handle this functionality. The main issue is that because JUMP_LABEL of the conditional jump can point to a code label, not a code symbol, I have no choice but keep it even if it is unused. If the infrastructure allows a symbol reference in all places where a label reference is allowed, only x86 backend changes are needed. BTW, some targets, like arm, don't set use count on referenced labels. I will add a target hook to opt-out the zero use count label. > Either way, please discuss with infrastructure maintainers (CC'd) > first if the approach is correct and if these changes to RTX stream > are allowed by the infra. > > Thanks, > Uros. > > > > > Jump tables like > > > > foo: > > .cfi_startproc > > cmpl$4, %edi > > ja .L1 > > movl%edi, %edi > > jmp *.L4(,%rdi,8) > > .section.rodata > > .L4: > > .quad .L8 > > .quad .L7 > > .quad .L6 > > .quad .L5 > > .quad .L3 > > .text > > .L5: > > jmp bar3 > > .L3: > > jmp bar4 > > .L8: > > jmp bar0 > > .L7: > > jmp bar1 > > .L6: > > jmp bar2 > > .L1: > > ret > > .cfi_endproc > > > > can also be changed to: > > > > foo: > > .cfi_startproc > > cmpl$4, %edi > > ja .L1 > > movl%edi, %edi > > jmp *.L4(,%rdi,8) > > .section.rodata > > .L4: > > .quad bar0 > > .quad bar1 > > .quad bar2 > > .quad bar3 > > .quad bar4 > > .text > > .L1: > > ret > > .cfi_endproc > > > > After basic block reordering pass, jump tables look like: > > > > (jump_table_data 16 15 17 (addr_vec:DI [ > > (label_ref:DI 18) > > (label_ref:DI 22) > > (label_ref:DI 26) > > (label_ref:DI 30) > > (label_ref:DI 34) > > ])) > > ... > > (code_label 30 17 31 4 5 (nil) [1 uses]) > > (note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > > (call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41] > > ) [0 bar3 S1 A8]) > > (const_int 0 [0])) "j.c":15:13 1469 {sibcall_di} > > (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41] > > ) > > (nil)) > > (nil)) > > > > If the jump table entry points to a target basic block with only a direct > > sibcall, change the entry to point to the sibcall target and decrement > > the target basic block entry label use count. If the target basic block > > isn't kept for JUMP_LABEL of the conditional tailcall, delete it if its > > entry label use count is 0. > > > > Update final_scan_insn_1 to skip a label if its use count is 0 and > > support symbol reference in jump table. Update create_trace_edges to > > skip symbol reference in jump table. > > > > H.J. Lu (2): > > x86: Add a pass to fold tail call > > x86: Fold sibcall targets into jump table > > > > gcc/config/i386/i386-features.cc
[PATCH v2 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes
Despite it's just a special case of "a widening product of which the result used for reduction," having these standard names allows to recognize the dot product pattern earlier and it may be beneficial to optimization. Also fix some test failures with the test cases: - gcc.dg/vect/vect-reduc-chain-2.c - gcc.dg/vect/vect-reduc-chain-3.c - gcc.dg/vect/vect-reduc-chain-dot-slp-3.c - gcc.dg/vect/vect-reduc-chain-dot-slp-4.c gcc/ChangeLog: * config/loongarch/simd.md (wvec_half): New define_mode_attr. (dot_prod): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/wide-mul-reduc-2.c (dg-final): Scan DOT_PROD_EXPR in optimized tree. --- gcc/config/loongarch/simd.md | 29 +++ .../gcc.target/loongarch/wide-mul-reduc-2.c | 3 +- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 661f5dc8dda..45d2bcaec2e 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -90,6 +90,12 @@ (define_mode_attr WVEC_HALF [(V2DI "V1TI") (V4DI "V2TI") (V8HI "V4SI") (V16HI "V8SI") (V16QI "V8HI") (V32QI "V16HI")]) +;; Lower-case version. +(define_mode_attr wvec_half [(V2DI "v1ti") (V4DI "v2ti") +(V4SI "v2di") (V8SI "v4di") +(V8HI "v4si") (V16HI "v8si") +(V16QI "v8hi") (V32QI "v16hi")]) + ;; Integer vector modes with the same length and unit size as a mode. (define_mode_attr VIMODE [(V2DI "V2DI") (V4SI "V4SI") (V8HI "V8HI") (V16QI "V16QI") @@ -786,6 +792,29 @@ (define_expand "_vmaddw__" DONE; }) +(define_expand "dot_prod" + [(match_operand: 0 "register_operand" "=f,f") + (match_operand:IVEC 1 "register_operand" " f,f") + (match_operand:IVEC 2 "register_operand" " f,f") + (match_operand: 3 "reg_or_0_operand" " 0,YG") + (any_extend (const_int 0))] + "" +{ + auto [op0, op1, op2, op3] = operands; + + if (op3 == CONST0_RTX (mode)) +emit_insn ( + gen__vmulwev__ (op0, op1, op2)); + else +emit_insn ( + gen__vmaddwev__ (op0, op3, op1, + op2)); + + emit_insn ( +gen__vmaddwod__ (op0, op0, op1, op2)); + DONE; +}) + (define_insn "simd_maddw_evod__hetero" [(set (match_operand: 0 "register_operand" "=f") (plus: diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c index 07a7601888a..61e92e58fc3 100644 --- a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c +++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mlasx" } */ +/* { dg-options "-O2 -mlasx -fdump-tree-optimized" } */ /* { dg-final { scan-assembler "xvmaddw(ev|od)\\.d\\.w" } } */ +/* { dg-final { scan-tree-dump "DOT_PROD_EXPR" "optimized" } } */ typedef __INT32_TYPE__ i32; typedef __INT64_TYPE__ i64; -- 2.48.1
[PATCH v2 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_mult__): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/wide-mul-reduc-1.c: New test. * gcc.target/loongarch/wide-mul-reduc-2.c: New test. --- gcc/config/loongarch/simd.md | 16 .../gcc.target/loongarch/wide-mul-reduc-1.c| 18 ++ .../gcc.target/loongarch/wide-mul-reduc-2.c| 17 + 3 files changed, 51 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index b7a28f7b3f2..661f5dc8dda 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -630,6 +630,7 @@ (define_expand "cbranch4" ;; Operations on elements at even/odd indices. (define_int_iterator zero_one [0 1]) (define_int_attr ev_od [(0 "ev") (1 "od")]) +(define_int_attr even_odd [(0 "even") (1 "odd")]) ;; Integer widening add/sub/mult. (define_insn "simd_w_evod__" @@ -665,6 +666,21 @@ (define_expand "_vw__" DONE; }) +(define_expand "vec_widen_mult__" + [(match_operand: 0 "register_operand" "=f") + (match_operand:IVEC 1 "register_operand" " f") + (match_operand:IVEC 2 "register_operand" " f") + (any_extend (const_int 0)) + (const_int zero_one)] + "" +{ + emit_insn ( +gen__vmulw__ (operands[0], +operands[1], +operands[2])); + DONE; +}) + (define_insn "simd_w_evod__hetero" [(set (match_operand: 0 "register_operand" "=f") (addsubmul: diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c new file mode 100644 index 000..d6e0da59dc4 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump "WIDEN_MULT_EVEN_EXPR" "optimized" } } */ +/* { dg-final { scan-tree-dump "WIDEN_MULT_ODD_EXPR" "optimized" } } */ + +typedef __INT32_TYPE__ i32; +typedef __INT64_TYPE__ i64; + +i32 x[8], y[8]; + +i64 +test (void) +{ + i64 ret = 0; + for (int i = 0; i < 8; i++) +ret ^= (i64) x[i] * y[i]; + return ret; +} diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c new file mode 100644 index 000..07a7601888a --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx" } */ +/* { dg-final { scan-assembler "xvmaddw(ev|od)\\.d\\.w" } } */ + +typedef __INT32_TYPE__ i32; +typedef __INT64_TYPE__ i64; + +i32 x[8], y[8]; + +i64 +test (void) +{ + i64 ret = 0; + for (int i = 0; i < 8; i++) +ret += (i64) x[i] * y[i]; + return ret; +} -- 2.48.1
Re: [PATCH] libstdc++: Implement P3138R5 views::cache_latest
On Tue, 11 Feb 2025 at 05:59, Patrick Palka wrote: > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? > > -- >8 -- > > libstdc++-v3/ChangeLog: > > * include/bits/version.def (ranges_cache_latest): Define. > * include/bits/version.h: Regenerate. > * include/std/ranges (cache_latest_view): Define for C++26. > (cache_latest_view::_Iterator): Likewise. > (cache_latest_view::_Sentinel): Likewise. > (views::__detail::__can_cache_latest): Likewise. > (views::_CacheLatest, views::cache_latest): Likewise. > * testsuite/std/ranges/adaptors/cache_latest/1.cc: New test. The test is missing from the patch. > --- > libstdc++-v3/include/bits/version.def | 8 ++ > libstdc++-v3/include/bits/version.h | 10 ++ > libstdc++-v3/include/std/ranges | 189 ++ > 3 files changed, 207 insertions(+) > > diff --git a/libstdc++-v3/include/bits/version.def > b/libstdc++-v3/include/bits/version.def > index 002e560dc0d..6fb5db2e1fc 100644 > --- a/libstdc++-v3/include/bits/version.def > +++ b/libstdc++-v3/include/bits/version.def > @@ -1837,6 +1837,14 @@ ftms = { >}; > }; > > +ftms = { > + name = ranges_cache_latest; > + values = { > +v = 202411; > +cxxmin = 26; > + }; > +}; > + > ftms = { >name = ranges_concat; >values = { > diff --git a/libstdc++-v3/include/bits/version.h > b/libstdc++-v3/include/bits/version.h > index 70de189b1e0..db61a396c45 100644 > --- a/libstdc++-v3/include/bits/version.h > +++ b/libstdc++-v3/include/bits/version.h > @@ -2035,6 +2035,16 @@ > #endif /* !defined(__cpp_lib_is_virtual_base_of) && > defined(__glibcxx_want_is_virtual_base_of) */ > #undef __glibcxx_want_is_virtual_base_of > > +#if !defined(__cpp_lib_ranges_cache_latest) > +# if (__cplusplus > 202302L) > +# define __glibcxx_ranges_cache_latest 202411L > +# if defined(__glibcxx_want_all) || > defined(__glibcxx_want_ranges_cache_latest) > +# define __cpp_lib_ranges_cache_latest 202411L > +# endif > +# endif > +#endif /* !defined(__cpp_lib_ranges_cache_latest) && > defined(__glibcxx_want_ranges_cache_latest) */ > +#undef __glibcxx_want_ranges_cache_latest > + > #if !defined(__cpp_lib_ranges_concat) > # if (__cplusplus > 202302L) > # define __glibcxx_ranges_concat 202403L > diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges > index 5c795a90fbc..db9a00be264 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -58,6 +58,7 @@ > #define __glibcxx_want_ranges_as_const > #define __glibcxx_want_ranges_as_rvalue > #define __glibcxx_want_ranges_cartesian_product > +#define __glibcxx_want_ranges_cache_latest > #define __glibcxx_want_ranges_concat > #define __glibcxx_want_ranges_chunk > #define __glibcxx_want_ranges_chunk_by > @@ -1534,6 +1535,8 @@ namespace views::__adaptor > this->_M_payload._M_apply(_Optional_func{__f}, __i); > return this->_M_get(); > } > + > + using _Optional_base<_Tp>::_M_reset; >}; > > template > @@ -10203,6 +10206,192 @@ namespace ranges > } // namespace ranges > #endif // __cpp_lib_ranges_concat > > +#if __cpp_lib_ranges_cache_latest // C++ >= 26 > +namespace ranges > +{ > + template > +requires view<_Vp> > + class cache_latest_view : public view_interface> > + { > +_Vp _M_base = _Vp(); > + > +using __cache_t = conditional_t>, > + add_pointer_t>, > + range_reference_t<_Vp>>; __conditional_t is cheaper to instantiate than conditional_t, so when it doesn't affect the mangled name of a public symbol we should prefer __conditional_t. > +__detail::__non_propagating_cache<__cache_t> _M_cache; > + > +class _Iterator; > +class _Sentinel; > + > + public: > +cache_latest_view() requires default_initializable<_Vp> = default; > + > +constexpr explicit > +cache_latest_view(_Vp __base) > +: _M_base(std::move(__base)) > +{ } > + > +constexpr _Vp > +base() const & requires copy_constructible<_Vp> > +{ return _M_base; } > + > +constexpr _Vp > +base() && > +{ return std::move(_M_base); } > + > +constexpr auto > +begin() > +{ return _Iterator(*this); } > + > +constexpr auto > +end() > +{ return _Sentinel(*this); } > + > +constexpr auto > +size() requires sized_range<_Vp> > +{ return ranges::size(_M_base); } > + > +constexpr auto > +size() const requires sized_range > +{ return ranges::size(_M_base); } > + }; > + > + template > +cache_latest_view(_Range&&) -> cache_latest_view>; > + > + template > +requires view<_Vp> > + class cache_latest_view<_Vp>::_Iterator > + { > +cache_latest_view* _M_parent; > +iterator_t<_Vp> _M_current; > + > +constexpr explicit > +_Iterator(cache_latest_view& __parent) > +: _M_parent(std::__addressof(__parent)), > + _M_current(ranges:
Re: [PATCH] arm: gimple fold aes[ed] [PR114522]
On 12/02/2025 11:01, Christophe Lyon wrote: > Almost a copy/paste from the recent aarch64 version of this patch, > this one is a bit more intrusive because it also introduces > arm_general_gimple_fold_builtin. > > With this patch, > gcc.target/arm/aes_xor_combine.c scan-assembler-not veor > passes again. > > gcc/ChangeLog: > > PR target/114522 > * config/arm/arm-builtins.cc (arm_fold_aes_op): New function. > (arm_general_gimple_fold_builtin): New function. > * config/arm/arm-builtins.h (arm_general_gimple_fold_builtin): New > prototype. > * config/arm/arm.cc (arm_gimple_fold_builtin): Call > arm_general_gimple_fold_builtin as needed. Thanks for picking this up. OK. R. > --- > gcc/config/arm/arm-builtins.cc | 55 ++ > gcc/config/arm/arm-builtins.h | 1 + > gcc/config/arm/arm.cc | 3 ++ > 3 files changed, 59 insertions(+) > > diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc > index e860607686c..c56ab5db985 100644 > --- a/gcc/config/arm/arm-builtins.cc > +++ b/gcc/config/arm/arm-builtins.cc > @@ -45,6 +45,9 @@ > #include "arm-builtins.h" > #include "stringpool.h" > #include "attribs.h" > +#include "basic-block.h" > +#include "gimple.h" > +#include "ssa.h" > > #define SIMD_MAX_BUILTIN_ARGS 7 > > @@ -4053,4 +4056,56 @@ arm_cde_end_args (tree fndecl) > } > } > > +/* Fold a call to vaeseq_u8 and vaesdq_u8. > + That is `vaeseq_u8 (x ^ y, 0)` gets folded > + into `vaeseq_u8 (x, y)`.*/ > +static gimple * > +arm_fold_aes_op (gcall *stmt) > +{ > + tree arg0 = gimple_call_arg (stmt, 0); > + tree arg1 = gimple_call_arg (stmt, 1); > + if (integer_zerop (arg0)) > +arg0 = arg1; > + else if (!integer_zerop (arg1)) > +return nullptr; > + if (TREE_CODE (arg0) != SSA_NAME) > +return nullptr; > + if (!has_single_use (arg0)) > +return nullptr; > + auto *s = dyn_cast (SSA_NAME_DEF_STMT (arg0)); > + if (!s || gimple_assign_rhs_code (s) != BIT_XOR_EXPR) > +return nullptr; > + gimple_call_set_arg (stmt, 0, gimple_assign_rhs1 (s)); > + gimple_call_set_arg (stmt, 1, gimple_assign_rhs2 (s)); > + return stmt; > +} > + > +/* Try to fold STMT, given that it's a call to the built-in function with > + subcode FCODE. Return the new statement on success and null on > + failure. */ > +gimple * > +arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt) > +{ > + gimple *new_stmt = NULL; > + > + switch (fcode) > +{ > +case ARM_BUILTIN_CRYPTO_AESE: > +case ARM_BUILTIN_CRYPTO_AESD: > + new_stmt = arm_fold_aes_op (stmt); > + break; > +} > + > + /* GIMPLE assign statements (unlike calls) require a non-null lhs. If we > + created an assign statement with a null lhs, then fix this by assigning > + to a new (and subsequently unused) variable. */ > + if (new_stmt && is_gimple_assign (new_stmt) && !gimple_assign_lhs > (new_stmt)) > +{ > + tree new_lhs = make_ssa_name (gimple_call_return_type (stmt)); > + gimple_assign_set_lhs (new_stmt, new_lhs); > +} > + > + return new_stmt; > +} > + > #include "gt-arm-builtins.h" > diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h > index 1fa85b602d9..3a646619f44 100644 > --- a/gcc/config/arm/arm-builtins.h > +++ b/gcc/config/arm/arm-builtins.h > @@ -32,6 +32,7 @@ enum resolver_ident { > }; > enum resolver_ident arm_describe_resolver (tree); > unsigned arm_cde_end_args (tree); > +gimple *arm_general_gimple_fold_builtin (unsigned int, gcall *); > > #define ENTRY(E, M, Q, S, T, G) E, > enum arm_simd_type > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index a95ddf8201f..00499a26bae 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -76,6 +76,7 @@ > #include "aarch-common.h" > #include "aarch-common-protos.h" > #include "machmode.h" > +#include "arm-builtins.h" > > /* This file should be included last. */ > #include "target-def.h" > @@ -2859,7 +2860,9 @@ arm_gimple_fold_builtin (gimple_stmt_iterator *gsi) >switch (code & ARM_BUILTIN_CLASS) > { > case ARM_BUILTIN_GENERAL: > + new_stmt = arm_general_gimple_fold_builtin (subcode, stmt); >break; > + > case ARM_BUILTIN_MVE: >new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt); > }
[PATCH] c, v2: do not warn about truncating NUL char when initializing nonstring arrays [PR117178]
On Wed, Feb 05, 2025 at 10:53:24AM -0800, Kees Cook wrote: > On Wed, Feb 05, 2025 at 12:59:58PM +0100, Jakub Jelinek wrote: > > Kees, any progress on this? > > I need to take another run at it. I got stalled out when I discovered > that I array-of-char-arrays attributes got applied at the "wrong" depth, > and stuff wasn't working. > > e.g.: > > char acpi_table[TABLE_SIZE][4] __attribute((nonstring)) = { > { "ohai" }, > { "1234" }, > }; > > when nonstring was checked for on something like "acpi_table[2]" it > wouldn't be found, since it was applied at the top level. While I think we should address that, I think it should be handled incrementally, it is basically a change in the nonstring attribute and needs to be dealt wherever nonstring is handled. In order to speed things up, I took your patch and applied Marek's and my review comments to it, furthermore removed unreachable code - if (warn_cxx_compat || len - unit > avail) ... else if (warn_unterminated_string_initialization) { if (len - unit > avail) ... else ... } makes no sense, as the second len - unit > avail will be always false. And tweaked the test coverage a little bit as well. Kees, are you submitting this under assignment to FSF (maybe the Google one if it has one) or DCO? See https://gcc.gnu.org/contribute.html#legal for details. If DCO, can you add your Signed-off-by: tag for it? So far lightly tested, ok for trunk if it passes bootstrap/regtest? 2025-02-13 Kees Cook Jakub Jelinek PR c/117178 gcc/ * doc/invoke.texi (Wunterminated-string-initialization): Document the new interaction between this warning and -Wc++-compat and that initialization of decls with nonstring attribute aren't warned about. gcc/c-family/ * c.opt (Wunterminated-string-initialization): Don't depend on -Wc++-compat. gcc/c/ * c-typeck.cc (digest_init): Add DECL argument. Adjust wording of pedwarn_init for too long strings and provide details on the lengths, for string literals where just the trailing NULL doesn't fit warn for warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++" and provides details on lengths, otherwise for warn_unterminated_string_initialization adjust the warning, provide details on lengths and don't warn if get_attr_nonstring_decl (decl). (build_c_cast, store_init_value, output_init_element): Adjust digest_init callers. gcc/testsuite/ * gcc.dg/Wunterminated-string-initialization.c: Add additional test coverage. * gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of the diagnostics. * gcc.dg/Wcxx-compat-23.c: New test. * gcc.dg/Wcxx-compat-24.c: New test. --- gcc/doc/invoke.texi.jj 2025-02-13 10:17:17.320789358 +0100 +++ gcc/doc/invoke.texi 2025-02-13 13:11:42.089042791 +0100 @@ -8661,17 +8661,20 @@ give a larger number of false positives @opindex Wunterminated-string-initialization @opindex Wno-unterminated-string-initialization @item -Wunterminated-string-initialization @r{(C and Objective-C only)} -Warn about character arrays -initialized as unterminated character sequences -with a string literal. +Warn about character arrays initialized as unterminated character sequences +with a string literal, unless the declaration being initialized has +the @code{nonstring} attribute. For example: @smallexample -char arr[3] = "foo"; +char arr[3] = "foo"; /* Warning. */ +char arr2[3] __attribute__((nonstring)) = "bar"; /* No warning. */ @end smallexample -This warning is enabled by @option{-Wextra} and @option{-Wc++-compat}. -In C++, such initializations are an error. +This warning is enabled by @option{-Wextra}. If @option{-Wc++-compat} +is enabled, the warning has slightly different wording and warns even +if the declaration being initialized has the @code{nonstring} warning, +as in C++ such initializations are an error. @opindex Warray-compare @opindex Wno-array-compare --- gcc/c-family/c.opt.jj 2025-01-02 11:47:29.681229781 +0100 +++ gcc/c-family/c.opt 2025-02-13 12:49:47.187320829 +0100 @@ -1550,7 +1550,7 @@ C ObjC Var(warn_unsuffixed_float_constan Warn about unsuffixed float constants. Wunterminated-string-initialization -C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C ObjC,Wextra || Wc++-compat) +C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C ObjC,Wextra) Warn about character arrays initialized as unterminated character sequences with a string literal. Wunused --- gcc/c/c-typeck.cc.jj2025-01-14 09:36:43.751522483 +0100 +++ gcc/c/c-typeck.cc 2025-02-13 12:52:14.366275230 +0100 @@ -116,8 +116,8 @@ static void push_member_name (tree); static int spelling_length (void); static char *print_spelling (char *); static void warning_init (location_t, int, const char *); -static tree digest_init (location_t, tree, tree, tr
Re: [PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]
On Thu, 13 Feb 2025, Jakub Jelinek wrote: > On Thu, Feb 13, 2025 at 12:48:44PM +0100, Richard Biener wrote: > > So what this basically does is ensure we mark DECL_VALUE_EXPR when > > VAR is marked which isn't done when marking a tree node. > > > > That you special-case the hashtable walker is a workaround for > > us not being able to say > > > > struct GTY((mark_extra_stuff)) tree_decl_with_vis { > > > > on 'tree' (or specifically the structs for a VAR_DECL). And that we > > rely on gengtype producing the 'tree' marker. So we rely on the > > hashtable keeping referenced trees live. > > Yes, we could just arrange for gt_ggc_mx_lang_tree_node to additionally > mark DECL_VALUE_EXPR for VAR_DECLs with DECL_HAS_VALUE_EXPR_P set (dunno how > exactly). > I think what the patch does should be slightly cheaper, we avoid those > DECL_VALUE_EXPR hash table lookups in the common case where DECL_VALUE_EXPR > of marked variables just refers to trees which reference only marked > VAR_DECLs and no unmarked ones. Agreed, I also don't know how to inject additional code into gt_ggc_mx_lang_tree_node. Richard.
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 1:47 AM, Robin Dapp wrote: Other thoughts? The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff we can't/don't model in the pipeline, but I have no idea how to model the VL=0 case there. Maybe so, but what Edwin is doing looks sensible enough. It wouldn't be the first time a hook got (ab)used in ways that weren't part of the original intent. I don't fully understand what's happening. So the hoisting is being done speculatively here? And it just happens to be "bad" because that might cause a VL=0 case. But are we sure a lack of speculation cannot cause such cases? Yes/No. The scheduler certainly has code to avoid hoisting when doing so would change semantics. That's not what's happening here. I'd have to put it in a debugger or read the full dumps with some crazy scheduler dump verbosity setting to be sure, but what I suspect is happening is the scheduler is processing a multi-block region (effectively an extended basic block). In this scenario the scheduler can pull insns from a later block into an earlier block, including past a conditional branch as long as it doesn't change program semantics. Also, why doesn't the vsetvl pass fix the situation? IMHO we need to understand the problem more thoroughly before changing things. In the end LCM minimizes the number of vsetvls and inserts them at the "earliest" point. If that is not sufficient I'd say we need modify the constraints (maybe on a per-uarch basis)? The vsevl pass is LCM based. So it's not allowed to add a vsetvl on a path that didn't have a vsetvl before. Consider this simple graph. 0 / \ 2-->3 If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl will land in bb4. bb0 is not a valid insertion point for the vsetvl pass because the path 0->3 doesn't strictly need a vsetvl. That's inherent in the LCM algorithm (anticipatable). The scheduler has no such limitations. The scheduler might create a scheduling region out of blocks 0 and 2. In that scenario, insns from block 2 may speculate into block 0 as long as doing so doesn't change semantics. On a separate note: How about we move the vsetvl pass after sched2? Then we could at least rely on LCM doing its work uninhibited and wouldn't reorder vsetvls afterwards. Or do we somehow rely on rtl_dce and BB reorder to run afterwards? That won't help with the problem here but might with others. It's a double edged sword. If you defer placement until after scheduling, then the vsetvls can wreck havoc with whatever schedule that sched2 came up with. It won't matter much for out of order designs, but potentially does for others. In theory at sched2 time the insn stream should be fixed. There are practical/historical exceptions, but changes to the insn stream after that point are discouraged. jeff
RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]
On Wed, 12 Feb 2025, Tamar Christina wrote: > > -Original Message- > > From: Tamar Christina > > Sent: Wednesday, February 12, 2025 3:20 PM > > To: Richard Biener > > Cc: gcc-patches@gcc.gnu.org; nd > > Subject: RE: [PATCH v2]middle-end: delay checking for alignment to load > > [PR118464] > > > > > -Original Message- > > > From: Richard Biener > > > Sent: Wednesday, February 12, 2025 2:58 PM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; nd > > > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load > > > [PR118464] > > > > > > On Tue, 11 Feb 2025, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > This fixes two PRs on Early break vectorization by delaying the safety > > > > checks to > > > > vectorizable_load when the VF, VMAT and vectype are all known. > > > > > > > > This patch does add two new restrictions: > > > > > > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject > > > > uneven > > > >group sizes, as they are unaligned every n % 2 iterations and so may > > > > cross > > > >a page unwittingly. > > > > > > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject > > > > vectorization > > > if > > > >we cannot peel for alignment, as the alignment requirement is quite > > > > large at > > > >GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial > > > > so we > > > >don't support it for now. > > > > > > > > There are other steps documented inside the code itself so that the > > > > reasoning > > > > is next to the code. > > > > > > > > Note that for VLA I have still left this fully disabled when not > > > > working on a > > > > fixed buffer. > > > > > > > > For VLA targets like SVE return element alignment as the desired vector > > > > alignment. This means that the loads are never misaligned and so > > > > annoying it > > > > won't ever need to peel. > > > > > > > > So what I think needs to happen in GCC 16 is that. > > > > > > > > 1. during vect_compute_data_ref_alignment we need to take the max of > > > >POLY_VALUE_MIN and vector_alignment. > > > > > > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the > > > > guard add > > a > > > >check that ncopies * vectype does not exceed POLY_VALUE_MAX which we > > use > > > as a > > > >proxy for pagesize. > > > > > > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in > > > >vect_determine_partial_vectors_and_peeling since the first iteration > > > > has to > > > >be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail > > > > to > > > >vectorize. > > > > > > > > 4. Create a default mask to be used, so that > > > vect_use_loop_mask_for_alignment_p > > > >becomes true and we generate the peeled check through loop control > > > > for > > > >partial loops. From what I can tell this won't work for > > > >LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling > > > support at > > > >all in the compiler. That would need to be done independently from > > > > the > > > >above. > > > > > > We basically need to implement peeling/versioning for alignment based > > > on the actual POLY value with the fallback being first-fault loads. > > > > > > > In any case, not GCC 15 material so I've kept the WIP patches I have > > > downstream. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > -m32, -m64 and no issues. > > > > > > > > Ok for master? > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > PR tree-optimization/118464 > > > > PR tree-optimization/116855 > > > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer > > > > use. > > > > * tree-vect-data-refs.cc > > > > (vect_analyze_early_break_dependences): Delay > > > > checks. > > > > (vect_compute_data_ref_alignment): Remove alignment checks and > > > > move > > > to > > > > get_load_store_type, increase group access alignment. > > > > (vect_enhance_data_refs_alignment): Add note to comment needing > > > > investigating. > > > > (vect_analyze_data_refs_alignment): Likewise. > > > > (vect_supportable_dr_alignment): For group loads look at first > > > > DR. > > > > * tree-vect-stmts.cc (get_load_store_type): > > > > Perform safety checks for early break pfa. > > > > * tree-vectorizer.h (dr_peeling_alignment, > > > > dr_set_peeling_alignment): New. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR tree-optimization/118464 > > > > PR tree-optimization/116855 > > > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes > > > > because the > > > > load type is relaxed later. > > > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. > > > > * gcc.dg/vect/vect-early-break_22.c: Reje
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 5:12 AM, Vineet Gupta wrote: On 2/13/25 14:17, Robin Dapp wrote: Other thoughts? The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff we can't/don't model in the pipeline, but I have no idea how to model the VL=0 case there. Maybe so, but what Edwin is doing looks sensible enough. It wouldn't be the first time a hook got (ab)used in ways that weren't part of the original intent. I don't fully understand what's happening. So the hoisting is being done speculatively here? And it just happens to be "bad" because that might cause a VL=0 case. But are we sure a lack of speculation cannot cause such cases? Exactly. My gut feeling w/o deep dive was this seemed like papering over the issue. Perhaps, but I'm pretty confident that even if this specific situation turns out to be slightly different that the scenario I see can/will happen elsewhere. BTW what exactly is speculative scheduling ? As in what is it actually trying to schedule ahead ? In simplest terms assume we have this kind of graph 0 / \ 1-->2 The scheduler knows how to build scheduling regions, essentially extended basic blocks. In this case we have two regions one with the blocks 0,1 the other being just block 2. In the multi-block region 0,1 we allow insns from block 1 to speculate into block 0. Let's assume we're on a simple 2-wide in order machine and somewhere in bb0 we there's a slot available for an insn that we couldn't fill with anything useful from bb0. In that case we may speculate an insn from bb1 into bb0 to execute "for free" in that unused slot. That's the basic idea. It was particularly helpful for in-order cores in the past. It's dramatically less important for an out of order core since those are likely doing the speculation in hardware. Naturally if you're using icounts for evaluation this kind of behavior is highly undesirable since that kind of evaluation says the transformation is bad, but in reality on certain designs is quite helpful. Jeff
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
> The vsevl pass is LCM based. So it's not allowed to add a vsetvl on a > path that didn't have a vsetvl before. Consider this simple graph. > > 0 > / \ >2-->3 > > If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl > will land in bb4. bb0 is not a valid insertion point for the vsetvl > pass because the path 0->3 doesn't strictly need a vsetvl. That's > inherent in the LCM algorithm (anticipatable). Yeah, I remember the same issue with the rounding-mode setter placement. Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3 (or any other block that doesn't require one)? Such a dummy vsetvl would be fusible with every other vsetvl. If there are dummy vsetvls remaining after LCM just delete them? Just thinking out loud, the devil will be in the details. -- Regards Robin
[PATCH] [ifcombine] cope with signbit tests of extended values
A compare with zero may be taken as a sign bit test by fold_truth_andor_for_ifcombine, but the operand may be extended from a narrower field. If the operand was narrower, the bitsize will reflect the narrowing conversion, but if it was wider, we'll only know whether the field is sign- or zero-extended from unsignedp, but we won't know whether it needed to be extended, because arg will have changed to the narrower variable when we get to the point in which we can compute the arg width. If it's sign-extended, we're testing the right bit, but if it's zero-extended, there isn't any bit we can test. Instead of punting and leaving the foldable compare to be figured out by another pass, arrange for the sign bit resulting from the widening zero-extension to be taken as zero, so that the modified compare will yield the desired result. While at that, avoid swapping the right-hand compare operands when we've already determined that it was a signbit test: it no use to even try. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog PR tree-optimization/118805 * gimple-fold.cc (fold_truth_andor_for_combine): Detect and cope with zero-extension in signbit tests. Reject swapping right-compare operands if rsignbit. for gcc/testsuite/ChangeLog PR tree-optimization/118805 * gcc.dg/field-merge-26.c: New. --- gcc/gimple-fold.cc| 22 +- gcc/testsuite/gcc.dg/field-merge-26.c | 20 2 files changed, 37 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/field-merge-26.c diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index 29191685a43c5..0380c7af4c213 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -8090,14 +8090,16 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, /* Prepare to turn compares of signed quantities with zero into sign-bit tests. We need not worry about *_reversep here for these compare - rewrites: loads will have already been reversed before compares. */ - bool lsignbit = false, rsignbit = false; + rewrites: loads will have already been reversed before compares. Save the + precision, because [lr]l_arg may change and we won't be able to tell how + wide it was originally. */ + unsigned lsignbit = 0, rsignbit = 0; if ((lcode == LT_EXPR || lcode == GE_EXPR) && integer_zerop (lr_arg) && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg)) && !TYPE_UNSIGNED (TREE_TYPE (ll_arg))) { - lsignbit = true; + lsignbit = TYPE_PRECISION (TREE_TYPE (ll_arg)); lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR); } /* Turn compares of unsigned quantities with powers of two into @@ -8130,7 +8132,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg)) && !TYPE_UNSIGNED (TREE_TYPE (rl_arg))) { - rsignbit = true; + rsignbit = TYPE_PRECISION (TREE_TYPE (rl_arg)); rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR); } else if ((rcode == LT_EXPR || rcode == GE_EXPR) @@ -8204,7 +8206,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, || ! operand_equal_p (ll_inner, rl_inner, 0)) { /* Try swapping the operands. */ - if (ll_reversep != rr_reversep + if (ll_reversep != rr_reversep || rsignbit || !operand_equal_p (ll_inner, rr_inner, 0)) return 0; @@ -8284,6 +8286,14 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, if (lsignbit) { wide_int sign = wi::mask (ll_bitsize - 1, true, ll_bitsize); + /* If ll_arg is zero-extended and we're testing the sign bit, we know +what the result should be. Shifting the sign bit out of sign will get +us to mask the entire field out, yielding zero, i.e., the sign bit of +the zero-extended value. We know the masked value is being compared +with zero, so the compare will get us the result we're looking +for: TRUE if EQ_EXPR, FALSE if NE_EXPR. */ + if (lsignbit > ll_bitsize && ll_unsignedp) + sign <<= 1; if (!ll_and_mask.get_precision ()) ll_and_mask = sign; else @@ -8303,6 +8313,8 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, if (rsignbit) { wide_int sign = wi::mask (rl_bitsize - 1, true, rl_bitsize); + if (rsignbit > rl_bitsize && ll_unsignedp) + sign <<= 1; if (!rl_and_mask.get_precision ()) rl_and_mask = sign; else diff --git a/gcc/testsuite/gcc.dg/field-merge-26.c b/gcc/testsuite/gcc.dg/field-merge-26.c new file mode 100644 index 0..96d7e7205c5f2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/field-merge-26.c @@ -0,0 +1,20 @@ +/* { dg-do run } */ +/* { dg-options "-O1 -fno-tree-ccp -fno-tree-copy-prop -fno-tree-forwprop -fno-tree-fre" } */ + +/* PR tree-optimization/118805 */ +
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
> Yeah, I remember the same issue with the rounding-mode setter placement. > > Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3 > (or any other block that doesn't require one)? Such a dummy vsetvl would be > fusible with every other vsetvl. If there are dummy vsetvls remaining after > LCM just delete them? > > Just thinking out loud, the devil will be in the details. Register liveness is of course relevant here. Will surely depend on the specific example whether that makes sense or not. -- Regards Robin
[PATCH v2 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description
These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even indices for define_insn's, and generate those vectors in define_expand's. For "backward compatibilty" we need to provide a "punned" version for the operations invoking TImode vectors as the intrinsics still expect DImode vectors. The stat is "201 insertions, 905 deletions." gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVADDWEV): Remove. (UNSPEC_LASX_XVADDWEV2): Remove. (UNSPEC_LASX_XVADDWEV3): Remove. (UNSPEC_LASX_XVSUBWEV): Remove. (UNSPEC_LASX_XVSUBWEV2): Remove. (UNSPEC_LASX_XVMULWEV): Remove. (UNSPEC_LASX_XVMULWEV2): Remove. (UNSPEC_LASX_XVMULWEV3): Remove. (UNSPEC_LASX_XVADDWOD): Remove. (UNSPEC_LASX_XVADDWOD2): Remove. (UNSPEC_LASX_XVADDWOD3): Remove. (UNSPEC_LASX_XVSUBWOD): Remove. (UNSPEC_LASX_XVSUBWOD2): Remove. (UNSPEC_LASX_XVMULWOD): Remove. (UNSPEC_LASX_XVMULWOD2): Remove. (UNSPEC_LASX_XVMULWOD3): Remove. (lasx_xvwev_h_b): Remove. (lasx_xvwev_w_h): Remove. (lasx_xvwev_d_w): Remove. (lasx_xvaddwev_q_d): Remove. (lasx_xvsubwev_q_d): Remove. (lasx_xvmulwev_q_d): Remove. (lasx_xvwod_h_b): Remove. (lasx_xvwod_w_h): Remove. (lasx_xvwod_d_w): Remove. (lasx_xvaddwod_q_d): Remove. (lasx_xvsubwod_q_d): Remove. (lasx_xvmulwod_q_d): Remove. (lasx_xvaddwev_q_du): Remove. (lasx_xvsubwev_q_du): Remove. (lasx_xvmulwev_q_du): Remove. (lasx_xvaddwod_q_du): Remove. (lasx_xvsubwod_q_du): Remove. (lasx_xvmulwod_q_du): Remove. (lasx_xvwev_h_bu_b): Remove. (lasx_xvwev_w_hu_h): Remove. (lasx_xvwev_d_wu_w): Remove. (lasx_xvwod_h_bu_b): Remove. (lasx_xvwod_w_hu_h): Remove. (lasx_xvwod_d_wu_w): Remove. (lasx_xvaddwev_q_du_d): Remove. (lasx_xvsubwev_q_du_d): Remove. (lasx_xvmulwev_q_du_d): Remove. (lasx_xvaddwod_q_du_d): Remove. (lasx_xvsubwod_q_du_d): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_XVADDWEV): Remove. (UNSPEC_LSX_VADDWEV2): Remove. (UNSPEC_LSX_VADDWEV3): Remove. (UNSPEC_LSX_VSUBWEV): Remove. (UNSPEC_LSX_VSUBWEV2): Remove. (UNSPEC_LSX_VMULWEV): Remove. (UNSPEC_LSX_VMULWEV2): Remove. (UNSPEC_LSX_VMULWEV3): Remove. (UNSPEC_LSX_VADDWOD): Remove. (UNSPEC_LSX_VADDWOD2): Remove. (UNSPEC_LSX_VADDWOD3): Remove. (UNSPEC_LSX_VSUBWOD): Remove. (UNSPEC_LSX_VSUBWOD2): Remove. (UNSPEC_LSX_VMULWOD): Remove. (UNSPEC_LSX_VMULWOD2): Remove. (UNSPEC_LSX_VMULWOD3): Remove. (lsx_vwev_h_b): Remove. (lsx_vwev_w_h): Remove. (lsx_vwev_d_w): Remove. (lsx_vaddwev_q_d): Remove. (lsx_vsubwev_q_d): Remove. (lsx_vmulwev_q_d): Remove. (lsx_vwod_h_b): Remove. (lsx_vwod_w_h): Remove. (lsx_vwod_d_w): Remove. (lsx_vaddwod_q_d): Remove. (lsx_vsubwod_q_d): Remove. (lsx_vmulwod_q_d): Remove. (lsx_vaddwev_q_du): Remove. (lsx_vsubwev_q_du): Remove. (lsx_vmulwev_q_du): Remove. (lsx_vaddwod_q_du): Remove. (lsx_vsubwod_q_du): Remove. (lsx_vmulwod_q_du): Remove. (lsx_vwev_h_bu_b): Remove. (lsx_vwev_w_hu_h): Remove. (lsx_vwev_d_wu_w): Remove. (lsx_vwod_h_bu_b): Remove. (lsx_vwod_w_hu_h): Remove. (lsx_vwod_d_wu_w): Remove. (lsx_vaddwev_q_du_d): Remove. (lsx_vsubwev_q_du_d): Remove. (lsx_vmulwev_q_du_d): Remove. (lsx_vaddwod_q_du_d): Remove. (lsx_vsubwod_q_du_d): Remove. (lsx_vmulwod_q_du_d): Remove. * config/loongarch/loongarch-modes.def: Add V1TI, V1DI, and V4TI. * config/loongarch/loongarch-protos.h (loongarch_gen_stepped_int_parallel): New function prototype. * config/loongarch/loongarch.cc (loongarch_print_operand): Accept 'O' for printing "ev" or "od." (loongarch_gen_stepped_int_parallel): Implement. * config/loongarch/loongarch.md (mode): Add V1DI, V1TI, and mention V2TI. * config/loongarch/predicates.md (vect_par_cnst_even_or_odd_half): New define_predicate. * config/loongarch/simd.md (WVEC_HALF): New define_mode_attr. (simdfmt_w): Likewise. (zero_one): New define_int_iterator. (ev_od): New define_int_attr. (simd_w_evod__): New define_insn. (_vw__): New define_expand. (simd_w_evod__hetero): New define_insn. (_vw__u_): New define_expand. (DIVEC): New defin
[PATCH] [testsuite] adjust expectations of x86 vect-simd-clone tests
Some vect-simd-clone tests fail when targeting ancient x86 variants, because the expected transformations only take place with -msse4 or higher. So arrange for these tests to take an -msse4 option on x86, so that the expected vectorization takes place, but decay to a compile test if vect.exp would enable execution but the target doesn't have an sse4 runtime. This requires the new dg-do-if to override the action on a target while retaining the default action on others, instead of disabling the test. We can count on avx512f compile-time support for these tests, because vect_simd_clones requires that on x86, and that implies sse4 support, so we need not complicate the scan conditionals with tests for sse4, except on the last test. Regstrapped on x86_64-linux-gnu, also tested with gcc-14 targeting x86_64-elf, targeting a cpu without sse4 support. Ok to install? for gcc/ChangeLog * doc/sourcebuild.texi (dg-do-if): Document. for gcc/testsuite/ChangeLog * lib/target-supports-dg.exp (dg-do-if): New. * gcc.dg/vect/vect-simd-clone-16f.c: Use -msse4 on x86, and skip in case execution is enabled but the runtime isn't. * gcc.dg/vect/vect-simd-clone-17f.c: Likewise. * gcc.dg/vect/vect-simd-clone-18f.c: Likewise. * gcc.dg/vect/vect-simd-clone-20.c: Likewise, but only skip the scan test. --- gcc/doc/sourcebuild.texi|5 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c |2 ++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c |2 ++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c |2 ++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c |6 +++-- gcc/testsuite/lib/target-supports-dg.exp| 29 +++ 6 files changed, 44 insertions(+), 2 deletions(-) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 98ede70f23c05..255d1a451e44d 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1128,6 +1128,11 @@ by the specified floating-point factor. @subsubsection Skip a test for some targets @table @code +@item @{ dg-do-if @var{action} @{ @var{selector} @} @} +Same as dg-do if the selector matches and the test hasn't already been +marked as unsupported. Use it to override an action on a target while +leaving the default action alone for other targets. + @item @{ dg-skip-if @var{comment} @{ @var{selector} @} [@{ @var{include-opts} @} [@{ @var{exclude-opts} @}]] @} Arguments @var{include-opts} and @var{exclude-opts} are lists in which each element is a string of zero or more GCC options. diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c index 7cd29e894d050..bb3b081b0e3d8 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c @@ -1,5 +1,7 @@ +/* { dg-do-if compile { target { sse2_runtime && { ! sse4_runtime } } } } */ /* { dg-require-effective-target vect_simd_clones } */ /* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */ +/* { dg-additional-options "-msse4" { target sse4 } } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c index 177521dc44531..504465614c989 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c @@ -1,5 +1,7 @@ +/* { dg-do-if compile { target { sse2_runtime && { ! sse4_runtime } } } } */ /* { dg-require-effective-target vect_simd_clones } */ /* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */ +/* { dg-additional-options "-msse4" { target sse4 } } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c index 4dd51381d73c0..0c418d4324821 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c @@ -1,5 +1,7 @@ +/* { dg-do-if compile { target { sse2_runtime && { ! sse4_runtime } } } } */ /* { dg-require-effective-target vect_simd_clones } */ /* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */ +/* { dg-additional-options "-msse4" { target sse4 } } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* } && { ! lp64 } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c index 9f51a68f3a0c8..3e626fc4d4d56 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c +++ b/gcc/testsuite/gcc.dg/vect/vect-s
[PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]
Hi! The following testcase ICEs, because we have multiple levels of DECL_VALUE_EXPR VAR_DECLs: character(kind=1) id_string[1:.id_string] [value-expr: *id_string.55]; character(kind=1)[1:.id_string] * id_string.55 [value-expr: FRAME.107.id_string.55]; integer(kind=8) .id_string [value-expr: FRAME.107..id_string]; id_string is the user variable mentioned in BLOCK_VARS, it has DECL_VALUE_EXPR because it is a VLA, id_string.55 is a temporary created by gimplify_vla_decl as the address that points to the start of the VLA, what is normally used in the IL to access it. But as this artificial var is then used inside of a nested function, tree-nested.cc adds DECL_VALUE_EXPR to it too and moves the actual value into the FRAME.107 object's member. Now, remove_unused_locals removes id_string.55 (and various other VAR_DECLs) from cfun->local_decls, simply because it is not mentioned in the IL at all (neither is id_string itself, but that is kept in BLOCK_VARS as it has DECL_VALUE_EXPR). So, after this point, id_string.55 tree isn't referenced from anywhere but id_string's DECL_VALUE_EXPR. Next GC collection is triggered, and we are unlucky enough that in the value_expr_for_decl hash table (underlying hash map for DECL_VALUE_EXPR) the id_string.55 entry comes before the id_string entry. id_string is ggc_marked_p because it is referenced from BLOCK_VARS, but id_string.55 is not, as we don't mark DECL_VALUE_EXPR anywhere but by gt_cleare_cache on value_expr_for_decl. But gt_cleare_cache does two things, it calls clear_slots on entries where the key is not ggc_marked_p (so the id_string.55 mapping to FRAME.107.id_string.55 is lost and DECL_VALUE_EXPR (id_string.55) becomes NULL) but then later we see id_string entry, which is ggc_marked_p, so mark the whole hash table entry, which sets ggc_set_mark on id_string.55. But at this point its DECL_VALUE_EXPR is lost. Later during dwarf2out.cc we want to emit DW_AT_location for id_string, see it has DECL_VALUE_EXPR, so emit it as indirection of id_string.55 for which we again lookup DECL_VALUE_EXPR as it has DECL_HAS_VALUE_EXPR_P, but as it is NULL, we ICE, instead of finding it is a subobject of FRAME.107 for which we can find its stack location. Now, as can be seen in the PR, I've tried to tweak tree-ssa-live.cc so that it would keep id_string.55 in cfun->local_decls; that prohibits it from the DECL_VALUE_EXPR of it being GC until expansion, but then we shrink and free cfun->local_decls completely and so GC at that point still can throw it away. The following patch adds an extension to the GTY ((cache)) option, before calling the gt_cleare_cache on some hash table by specifying GTY ((cache ("somefn"))) it calls somefn on that hash table as well. And this extra hook can do any additional ggc_set_mark needed so that gt_cleare_cache preserves everything that is actually needed and throws away the rest. In order to make it just 2 pass rather than up to n passes - (if we had say id1 -> something, id2 -> x(id1), id3 -> x(id2), id4 -> x(id3), id5 -> x(id4) in the value_expr_for_decl hash table in that order (where idN are VAR_DECLs with DECL_HAS_VALUE_EXPR_P, id5 is the only one mentioned from outside and idN -> X stands for idN having DECL_VALUE_EXPR X, something for some arbitrary tree and x(idN) for some arbitrary tree which mentions idN variable) and in each pass just marked the to part of entries with ggc_marked_p base.from we'd need to repeat until we don't mark anything) the patch calls walk_tree on DECL_VALUE_EXPR of the marked trees and if it finds yet unmarked tree, it marks it and walks its DECL_VALUE_EXPR as well the same way. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-02-13 Jakub Jelinek PR debug/118790 * gengtype.cc (write_roots): Remove cache variable, instead break from the loop on match and test o for NULL. If the cache option has non-empty string argument, call the specified function with v->name as argument before calling gt_cleare_cache on it. * tree.cc (gt_value_expr_mark_2, gt_value_expr_mark_1, gt_value_expr_mark): New functions. (value_expr_for_decl): Use GTY ((cache ("gt_value_expr_mark"))) rather than just GTY ((cache)). * doc/gty.texi (cache): Document optional argument of cache option. * gfortran.dg/gomp/pr118790.f90: New test. --- gcc/gengtype.cc.jj 2025-01-02 11:23:02.613710956 +0100 +++ gcc/gengtype.cc 2025-02-12 17:15:08.560424329 +0100 @@ -4656,13 +4656,12 @@ write_roots (pair_p variables, bool emit outf_p f = get_output_file_with_visibility (CONST_CAST (input_file*, v->line.file)); struct flist *fli; - bool cache = false; options_p o; for (o = v->opt; o; o = o->next) if (strcmp (o->name, "cache") == 0) - cache = true; - if (!cache) + break; + if (!o) continue;
[PATCH] [testsuite] add x86 effective target
I got tired of repeating the conditional that recognizes ia32 or x86_64, and introduced 'x86' as a shorthand for that, adjusting all occurrences in target-supports.exp, to set an example. I found some patterns that recognized i?86* and x86_64*, but I took those as likely cut&pastos instead of trying to preserve those weirdnesses. Regstrapped on x86_64-linux-gnu, also tested with gcc-14 targeting x86_64-elf. Ok to install? for gcc/ChangeLog * doc/sourcebuild.texi: Add x86 effective target. for gcc/testsuite/ChangeLog * lib/target-supports.exp (check_effective_target_x86): New. Replace all uses of i?86-*-* and x86_64-*-* in this file. --- gcc/doc/sourcebuild.texi |3 + gcc/testsuite/lib/target-supports.exp | 188 + 2 files changed, 99 insertions(+), 92 deletions(-) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 255d1a451e44d..d4e2a13dd77a4 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2801,6 +2801,9 @@ Target supports the execution of @code{user_msr} instructions. @item vect_cmdline_needed Target requires a command line argument to enable a SIMD instruction set. +@item x86 +Target is ia32 or x86_64. + @item xorsign Target supports the xorsign optab expansion. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 60e24129bd585..035f82eb86c93 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -740,7 +740,7 @@ proc check_profiling_available { test_what } { } if { $test_what == "-fauto-profile" } { - if { !([istarget i?86-*-linux*] || [istarget x86_64-*-linux*]) } { + if { !([check_effective_target_x86] && [istarget *-*-linux*]) } { verbose "autofdo only supported on linux" return 0 } @@ -2616,17 +2616,23 @@ proc remove_options_for_riscv_zvbb { flags } { return [add_options_for_riscv_z_ext zvbb $flags] } +# Return 1 if the target is ia32 or x86_64. + +proc check_effective_target_x86 { } { +if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) } { + return 1 +} else { +return 0 +} +} + # Return 1 if the target OS supports running SSE executables, 0 # otherwise. Cache the result. proc check_sse_os_support_available { } { return [check_cached_effective_target sse_os_support_available { # If this is not the right target then we can skip the test. - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { - expr 0 - } else { - expr 1 - } + expr [check_effective_target_x86] }] } @@ -2636,7 +2642,7 @@ proc check_sse_os_support_available { } { proc check_avx_os_support_available { } { return [check_cached_effective_target avx_os_support_available { # If this is not the right target then we can skip the test. - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { + if { !([check_effective_target_x86]) } { expr 0 } else { # Check that OS has AVX and SSE saving enabled. @@ -2659,7 +2665,7 @@ proc check_avx_os_support_available { } { proc check_avx512_os_support_available { } { return [check_cached_effective_target avx512_os_support_available { # If this is not the right target then we can skip the test. - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { + if { !([check_effective_target_x86]) } { expr 0 } else { # Check that OS has AVX512, AVX and SSE saving enabled. @@ -2682,7 +2688,7 @@ proc check_avx512_os_support_available { } { proc check_sse_hw_available { } { return [check_cached_effective_target sse_hw_available { # If this is not the right target then we can skip the test. - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { + if { !([check_effective_target_x86]) } { expr 0 } else { check_runtime_nocache sse_hw_available { @@ -2706,7 +2712,7 @@ proc check_sse_hw_available { } { proc check_sse2_hw_available { } { return [check_cached_effective_target sse2_hw_available { # If this is not the right target then we can skip the test. - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { + if { !([check_effective_target_x86]) } { expr 0 } else { check_runtime_nocache sse2_hw_available { @@ -2730,7 +2736,7 @@ proc check_sse2_hw_available { } { proc check_sse4_hw_available { } { return [check_cached_effective_target sse4_hw_available { # If this is not the right target then we can skip the test. - if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } { + if { !([check_effective_target_x86]) } { expr 0 } else { check_runtime_nocache sse4_hw_available { @@ -2754,7 +2760,7 @@ proc check_sse4_hw_available { } { proc check_avx_h
[committed] testsuite: Add another range for coroutines testcase [PR118574]
Hi! On Tue, Feb 11, 2025 at 11:47:09PM +0100, Jason Merrill wrote: > The implementation in r15-3840 used a novel technique of wrapping the entire > range-for loop in a CLEANUP_POINT_EXPR, which confused the coroutines > transformation. Instead let's use the existing extend_ref_init_temps > mechanism. > > This does not revert all of r15-3840, only the parts that change how > CLEANUP_POINT_EXPRs are applied to range-for declarations. Thanks. Here is a patch which adds another range for coroutine testcase, which doesn't extend (across co_await) just the __for_range var and what it binds to (so passes even without -frange-for-ext-temps), but also some other temporaries and verifies they are destructed in the right order. Tested on x86_64-linux, committed to trunk as obvious. 2025-02-13 Jakub Jelinek PR c++/118574 * g++.dg/coroutines/range-for2.C: New test. --- gcc/testsuite/g++.dg/coroutines/range-for2.C.jj 2025-02-13 11:28:48.381043861 +0100 +++ gcc/testsuite/g++.dg/coroutines/range-for2.C2025-02-13 11:49:03.872040995 +0100 @@ -0,0 +1,92 @@ +// PR c++/118574 +// { dg-do run } +// { dg-additional-options "-std=c++23 -O2" } + +#include + +[[gnu::noipa]] void +baz (int *) +{ +} + +struct D { + D () : d (new int (42)) {} + ~D () { if (*d != 42) __builtin_abort (); *d = 0; baz (d); delete d; } + int *d; +}; + +struct E { + E (const D &x) : e (x) {} + void test () const { if (*e.d != 42) __builtin_abort (); } + ~E () { test (); } + const D &e; +}; + +struct A { + const char **a = nullptr; + int n = 0; + const E *e1 = nullptr; + const E *e2 = nullptr; + void test () const { if (e1) e1->test (); if (e2) e2->test (); } + void push_back (const char *x) { test (); if (!a) a = new const char *[2]; a[n++] = x; } + const char **begin () const { test (); return a; } + const char **end () const { test (); return a + n; } + ~A () { test (); delete[] a; } +}; + +struct B { + long ns; + bool await_ready () const noexcept { return false; } + void await_suspend (std::coroutine_handle<> h) const noexcept { +volatile int v = 0; +while (v < ns) + v = v + 1; +h.resume (); + } + void await_resume () const noexcept {} +}; + +struct C { + struct promise_type { +const char *value; +std::suspend_never initial_suspend () { return {}; } +std::suspend_always final_suspend () noexcept { return {}; } +void return_value (const char *v) { value = v; } +void unhandled_exception () { __builtin_abort (); } +C get_return_object () { return C{this}; } + }; + promise_type *p; + explicit C (promise_type *p) : p(p) {} + const char *get () { return p->value; } +}; + +A +foo (const E &e1, const E &e2) +{ + A a; + a.e1 = &e1; + a.e2 = &e2; + a.push_back ("foo"); + a.push_back ("bar"); + return a; +} + +C +bar () +{ + A ret; + for (const auto &item : foo (E{D {}}, E{D {}})) +{ + co_await B{20}; + ret.push_back (item); +} + co_return "foobar"; +} + +int +main () +{ + auto task = bar (); + if (__builtin_strcmp (task.get (), "foobar")) +__builtin_abort (); +} Jakub
Re: [PATCH] LoongArch: Accept ADD, IOR or XOR when combining objects with no bits in common [PR115478]
LGTM! Thanks! 在 2025/2/11 下午2:34, Xi Ruoyao 写道: Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR. It's generally a good thing (allowing to use our alsl instruction or similar instrunction on other architectures), but it's preventing us from using bytepick. For example, if we shift a __int128 by 16 bits, the higher word can be produced via a single bytepick.d instruction with immediate 2, but we got: srli.d $r12,$r4,48 slli.d $r5,$r5,16 slli.d $r4,$r4,16 add.d $r5,$r12,$r5 jr $r1 This wasn't work with GCC 14, but after r15-6490 it's supposed to work if IOR was used instead of PLUS. To fix this, add a code iterator to match IOR, XOR, and PLUS and use it instead of just IOR if we know the operands have no overlapping bits. gcc/ChangeLog: * config/loongarch/loongarch.md (any_or_plus): New define_code_iterator. (bstrins__for_ior_mask): Use any_or_plus instead of ior. (bytepick_w_): Likewise. (bytepick_d_): Likewise. (bytepick_d__rev): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/bytepick_shift_128.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 46 +++ .../gcc.target/loongarch/bytepick_shift_128.c | 9 2 files changed, 36 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/bytepick_shift_128.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 2baba13560a..6f507c3c7f6 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -488,6 +488,10 @@ (define_code_attr bitwise_operand [(and "and_operand") (xor "uns_arith_operand")]) (define_code_attr is_and [(and "true") (ior "false") (xor "false")]) +;; If we know the operands does not have overlapping bits, use this +;; instead of just ior to cover more cases. +(define_code_iterator any_or_plus [any_or plus]) + ;; This code iterator allows unsigned and signed division to be generated ;; from the same template. (define_code_iterator any_div [div udiv mod umod]) @@ -1588,10 +1592,11 @@ (define_insn "*one_cmplsi2_internal" (define_insn_and_split "*bstrins__for_ior_mask" [(set (match_operand:GPR 0 "register_operand" "=r") - (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r") - (match_operand:GPR 2 "const_int_operand" "i")) -(and:GPR (match_operand:GPR 3 "register_operand" "r") - (match_operand:GPR 4 "const_int_operand" "i"] + (any_or_plus:GPR + (and:GPR (match_operand:GPR 1 "register_operand" "r") + (match_operand:GPR 2 "const_int_operand" "i")) + (and:GPR (match_operand:GPR 3 "register_operand" "r") + (match_operand:GPR 4 "const_int_operand" "i"] "loongarch_pre_reload_split () && loongarch_use_bstrins_for_ior_with_mask (mode, operands)" "#" @@ -4256,12 +4261,13 @@ (define_expand "2" DONE; }) -(define_insn "bytepick_w_" +(define_insn "*bytepick_w_" [(set (match_operand:SI 0 "register_operand" "=r") - (ior:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r") -(const_int )) - (ashift:SI (match_operand:SI 2 "register_operand" "r") - (const_int bytepick_w_ashift_amount] + (any_or_plus:SI + (lshiftrt:SI (match_operand:SI 1 "register_operand" "r") + (const_int )) + (ashift:SI (match_operand:SI 2 "register_operand" "r") +(const_int bytepick_w_ashift_amount] "" "bytepick.w\t%0,%1,%2," [(set_attr "mode" "SI")]) @@ -4299,22 +4305,24 @@ (define_insn "bytepick_w_1_extend" "bytepick.w\t%0,%2,%1,1" [(set_attr "mode" "SI")]) -(define_insn "bytepick_d_" +(define_insn "*bytepick_d_" [(set (match_operand:DI 0 "register_operand" "=r") - (ior:DI (lshiftrt (match_operand:DI 1 "register_operand" "r") - (const_int )) - (ashift (match_operand:DI 2 "register_operand" "r") - (const_int bytepick_d_ashift_amount] + (any_or_plus:DI + (lshiftrt (match_operand:DI 1 "register_operand" "r") + (const_int )) + (ashift (match_operand:DI 2 "register_operand" "r") + (const_int bytepick_d_ashift_amount] "TARGET_64BIT" "bytepick.d\t%0,%1,%2," [(set_attr "mode" "DI")]) -(define_insn "bytepick_d__rev" +(define_insn "*bytepick_d__rev" [(set (match_operand:DI 0 "register_operand" "=r") - (ior:DI (ashift (match_operand:DI 1 "register_operand" "r") - (const_int bytepick_d_ashift_amount)) - (lshiftrt (match_operand:DI 2 "register_operand" "r") -
Re: [pushed] c++: don't default -frange-for-ext-temps in -std=gnu++20 [PR188574]
On Wed, Feb 12, 2025 at 12:07:53AM +0100, Jason Merrill wrote: > Tested x86_64-pc-linux-gnu, applying to trunk. > > -- 8< -- > > Since -frange-for-ext-temps has been causing trouble, let's not enable it > by default in pre-C++23 GNU modes for GCC 15, and also allow disabling it in > C++23 and up. The reason for disallowing disabling it for C++23 and up has been feature test macros, but admittedly that will be only a problem if/when C++26 or C++29 etc. add another range for paper and bump __cpp_range_based_for value again. At that point unless that change is conditional on another flag we'd need to require -frange-for-ext-temps to be on. This can certainly wait until that happens (if ever). Jakub
Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description
Hi, Ruoyao: When will it be convenient for you to submit the v2 version of the patch? I am planning to merge the current patches and then test the optimal values for -malign-{functions,labels,jumps,loops} on that basis. 在 2025/2/12 上午3:30, Xi Ruoyao 写道: On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote: 在 2025/2/7 下午8:09, Xi Ruoyao 写道: /* snip */ - -(define_insn "lasx_xvpickev_w" - [(set (match_operand:V8SI 0 "register_operand" "=f") - (vec_select:V8SI - (vec_concat:V16SI - (match_operand:V8SI 1 "register_operand" "f") - (match_operand:V8SI 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) - (const_int 8) (const_int 10) - (const_int 4) (const_int 6) - (const_int 12) (const_int 14)])))] - "ISA_HAS_LASX" - "xvpickev.w\t%u0,%u2,%u1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V8SI")]) - /* snip */ +;; Picking even/odd elements. +(define_insn "simd_pick_evod_" + [(set (match_operand:ALLVEC 0 "register_operand" "=f") + (vec_select:ALLVEC + (vec_concat: + (match_operand:ALLVEC 1 "register_operand" "f") + (match_operand:ALLVEC 2 "register_operand" "f")) + (match_operand: 3 "vect_par_cnst_even_or_odd_half")))] For LASX, the generated select array is problematic, taking xvpickev.w as an example: xvpickev.w vd,vj,vk The behavior of the instruction is as follows: vd.w[0] = vk.w[0] vd.w[1] = vk.w[2] vd.w[2] = vj.w[0] vd.w[3] = vj.w[2] vd.w[4] = vk.w[4] vd.w[5] = vk.w[6] vd.w[6] = vj.w[4] vd.w[7] = vj.w[6] Oops stupid I. Strangely the bootstrapping (even with BOOT_CFLAGS="-O2 -g -march=la664") and regtesting cannot catch it. I'll limit this to LSX in v2.
[COMMITTED] doc: Update install.texi for GCC 15 on Solaris
Apart from minor updates, this patch is primarily an important caveat about binutils PR ld/32580, which has broken the binutils 2.44 ld on Solaris/x86. Tested on i386-pc-solaris2.11, committed to trunk. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2025-02-11 Rainer Orth gcc: * doc/install.texi (Specific, *-*-solaris2*): Updates for newer Solaris 11.4 SRUs and binutils 2.44. # HG changeset patch # Parent e96fa536cfda3b63e25f7fa1bd6b17875d7ec056 doc: Update install.texi for GCC 15 diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -4840,7 +4840,7 @@ Support for Solaris 10 has been removed 9 has been removed in GCC 5. Support for Solaris 8 has been removed in GCC 4.8. Support for Solaris 7 has been removed in GCC 4.6. -Solaris 11.4 provides one or more of GCC 5, 7, 9, 10, 11, 12, and 13. +Solaris 11.4 provides one or more of GCC 5, 7, 9, 10, 11, 12, 13, and 14. You need to install the @code{system/header}, @code{system/linker}, and @code{developer/assembler} packages. @@ -4862,7 +4862,7 @@ conjunction with the Solaris linker. The GNU @command{as} versions included in Solaris 11.4, from GNU binutils 2.30.1 or newer (in @file{/usr/bin/gas} and @file{/usr/gnu/bin/as}), are known to work. The version from GNU -binutils 2.42 is known to work as well. Recent versions of the Solaris +binutils 2.44 is known to work as well. Recent versions of the Solaris assembler in @file{/usr/bin/as} work almost as well, though. To use GNU @command{as}, configure with the options @option{--with-gnu-as --with-as=@//usr/@/gnu/@/bin/@/as}. @@ -4870,9 +4870,12 @@ assembler in @file{/usr/bin/as} work alm For linking, the Solaris linker is preferred. If you want to use the GNU linker instead, the version in Solaris 11.4, from GNU binutils 2.30.1 or newer (in @file{/usr/gnu/bin/ld} and @file{/usr/bin/gld}), -works, as does the version from GNU binutils 2.42. However, it +works. However, it generally lacks platform specific features, so better stay with Solaris -@command{ld}. To use the LTO linker plugin +@command{ld}. When using the version from GNU binutils 2.44, there's +an important caveat: binutils @emph{must} be configured with +@code{CONFIG_SHELL=/bin/bash}, otherwise the linker's built-in linker +scripts get corrupted on x86. To use the LTO linker plugin (@option{-fuse-linker-plugin}) with GNU @command{ld}, GNU binutils @emph{must} be configured with @option{--enable-largefile}. To use Solaris @command{ld}, we recommend to configure with @@ -4894,7 +4897,7 @@ will be disabled if no appropriate versi work. In order to build the GNU Ada compiler, GNAT, a working GNAT is needed. -Since Solaris 11.4 SRU 39, GNAT 11, 12 or 13 is bundled in the +Since Solaris 11.4 SRU 39, GNAT 11, 12, 13 or 14 is bundled in the @code{developer/gcc/gcc-gnat} package. In order to build the GNU D compiler, GDC, a working @samp{libphobos} is
Re: [PATCH v2] x86: Properly find the maximum stack slot alignment
On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. > > gcc/ > > PR target/109780 > PR target/109093 > * config/i386/i386.cc (ix86_update_stack_alignment): New. > (ix86_find_all_reg_use_1): Likewise. > (ix86_find_all_reg_use): Likewise. > (ix86_find_max_used_stack_alignment): Also check memory accesses > from registers defined by stack or frame registers. > > gcc/testsuite/ > > PR target/109780 > PR target/109093 > * g++.target/i386/pr109780-1.C: New test. > * gcc.target/i386/pr109093-1.c: Likewise. > * gcc.target/i386/pr109780-1.c: Likewise. > * gcc.target/i386/pr109780-2.c: Likewise. > * gcc.target/i386/pr109780-3.c: Likewise. Some non-algorithmical changes below, otherwise LGTM. Please also get someone to review dataflow infrastructure usage, I am not well versed with it. +/* Helper function for ix86_find_all_reg_use. */ + +static void +ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access, + auto_bitmap &worklist) +{ + rtx src = SET_SRC (set); + if (MEM_P (src)) Also reject assignment from CONST_SCALAR_INT? +return; + + rtx dest = SET_DEST (set); + if (!REG_P (dest)) +return; Can we switch these two so the test for REG_P (dest) will be first? We are not interested in anything that doesn't assign to a register. +/* Find all registers defined with REG. */ + +static void +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access, + unsigned int reg, auto_bitmap &worklist) +{ + for (df_ref ref = DF_REG_USE_CHAIN (reg); + ref != NULL; + ref = DF_REF_NEXT_REG (ref)) +{ + if (DF_REF_IS_ARTIFICIAL (ref)) +continue; + + rtx_insn *insn = DF_REF_INSN (ref); + if (!NONDEBUG_INSN_P (insn)) +continue; Here we pass only NONJUMP_INSN_P (X) || JUMP_P (X) || CALL_P (X) + if (CALL_P (insn) || JUMP_P (insn)) +continue; And here remains only NONJUMP_INSN_P (X), so both above conditions could be substituted with: if (!NONJUMP_INSN_P (X)) continue; + + rtx set = single_set (insn); + if (set) +ix86_find_all_reg_use_1 (set, stack_slot_access, worklist); + + rtx pat = PATTERN (insn); + if (GET_CODE (pat) != PARALLEL) +continue; + + for (int i = 0; i < XVECLEN (pat, 0); i++) +{ + rtx exp = XVECEXP (pat, 0, i); + switch (GET_CODE (exp)) +{ +case ASM_OPERANDS: +case CLOBBER: +case PREFETCH: +case USE: + break; +case UNSPEC: +case UNSPEC_VOLATILE: + for (int j = XVECLEN (exp, 0) - 1; j >= 0; j--) +{ + rtx x = XVECEXP (exp, 0, j); + if (GET_CODE (x) == SET) +ix86_find_all_reg_use_1 (x, stack_slot_access, + worklist); +} + break; +case SET: + ix86_find_all_reg_use_1 (exp, stack_slot_access, + worklist); + break; +default: + debug_rtx (exp); Stray debug remaining? + HARD_REG_SET stack_slot_access; + CLEAR_HARD_REG_SET (stack_slot_access); + + /* Stack slot can be accessed by stack pointer, frame pointer or + registers defined by stack pointer or frame pointer. */ + auto_bitmap worklist; Please put a line of vertical space here ... + add_to_hard_reg_set (&stack_slot_access, Pmode, + STACK_POINTER_REGNUM); + bitmap_set_bit (worklist, STACK_POINTER_REGNUM); ... here ... + if (frame_pointer_needed) +{ + add_to_hard_reg_set (&stack_slot_access, Pmode, + HARD_FRAME_POINTER_REGNUM); + bitmap_set_bit (worklist, HARD_FRAME_POINTER_REGNUM); +} ... here ... + unsigned int reg; ... here ... + do +{ + reg = bitmap_clear_first_set_bit (worklist); + ix86_find_all_reg_use (stack_slot_access, reg, worklist); +} + while (!bitmap_empty_p (worklist)); + + hard_reg_set_iterator hrsi; ... here ... + EXECUTE_IF_SET_IN_HARD_REG_SET (stack_slot_access, 0, reg, hrsi) +for (df_ref ref = DF_REG_USE_CHAIN (reg); + ref != NULL; + ref = DF_REF_NEXT_REG (ref)) + { +if (DF_REF_IS_ARTIFICIAL (ref)) + continue; + +rtx_insn *insn = DF_REF_INSN (ref); ... and here. +if (!NONDEBUG_INSN_P (insn)) !NONJUMP_INSN_P ? + continue; Also some vertical space here. +note_stores (insn, ix86_update_stack_alignment, + &stack_alignment); + } } diff --git a/gcc/testsuite/gcc.target/i386/pr109093-1.c b/gcc/testsuite/gcc.target/i386/pr109093-1.c new file mode 100644 index 000..0459d1947f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109093-1.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx2 -mtune=znver1 -ftrivial-auto-var-init=zero -fno-stack-protector" } */ + Please use /* { dg-do run { target
[COMMITTED] build: Remove HAVE_LD_EH_FRAME_CIEV3
Old versions of Solaris ld and GNU ld didn't support CIEv3 in .eh_frame. To avoid this breaking the build [build] Default to DWARF 4 on Solaris if linker supports CIEv3 http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00669.html checked for the necessary linker support, defaulting to DWARF-2 if necessary. Solaris ld was fixed in Solaris 11.1, GNU ld in binutils 2.16, so this is long obsolete and only used in Solaris code anyway. This patch thus removes both the configure check and solaris_override_options. Bootstrapped without regressions on i386-pc-solaris2.11 and sparc-sun-solaris2.11. Committed to trunk. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2025-02-12 Rainer Orth gcc: * configure.ac (gcc_cv_ld_eh_frame_ciev3): Remove. * configure, config.in: Regenerate. * config/sol2.cc (solaris_override_options): Remove. * config/sol2.h (SUBTARGET_OVERRIDE_OPTIONS): Remove. * config/sol2-protos.h (solaris_override_options): Remove. # HG changeset patch # Parent 172c287f84e717c376d7214926fa3c33845335cb build: Remove HAVE_LD_EH_FRAME_CIEV3 diff --git a/gcc/config.in b/gcc/config.in --- a/gcc/config.in +++ b/gcc/config.in @@ -1774,12 +1774,6 @@ #endif -/* Define 0/1 if your linker supports CIE v3 in .eh_frame. */ -#ifndef USED_FOR_TARGET -#undef HAVE_LD_EH_FRAME_CIEV3 -#endif - - /* Define if your linker supports .eh_frame_hdr. */ #undef HAVE_LD_EH_FRAME_HDR diff --git a/gcc/config/sol2-protos.h b/gcc/config/sol2-protos.h --- a/gcc/config/sol2-protos.h +++ b/gcc/config/sol2-protos.h @@ -24,7 +24,6 @@ extern void solaris_elf_asm_comdat_secti extern void solaris_file_end (void); extern void solaris_insert_attributes (tree, tree *); extern void solaris_output_init_fini (FILE *, tree); -extern void solaris_override_options (void); /* In sol2-c.cc. */ extern void solaris_register_pragmas (void); diff --git a/gcc/config/sol2.cc b/gcc/config/sol2.cc --- a/gcc/config/sol2.cc +++ b/gcc/config/sol2.cc @@ -291,13 +291,4 @@ solaris_file_end (void) (NULL); } -void -solaris_override_options (void) -{ - /* Older versions of Solaris ld cannot handle CIE version 3 in .eh_frame. - Don't emit DWARF3/4 unless specifically selected if so. */ - if (!HAVE_LD_EH_FRAME_CIEV3 && !OPTION_SET_P (dwarf_version)) -dwarf_version = 2; -} - #include "gt-sol2.h" diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h --- a/gcc/config/sol2.h +++ b/gcc/config/sol2.h @@ -119,11 +119,6 @@ along with GCC; see the file COPYING3. TARGET_SUB_OS_CPP_BUILTINS(); \ } while (0) -#define SUBTARGET_OVERRIDE_OPTIONS \ - do { \ -solaris_override_options (); \ - } while (0) - #if DEFAULT_ARCH32_P #define MULTILIB_DEFAULTS { "m32" } #else diff --git a/gcc/configure b/gcc/configure --- a/gcc/configure +++ b/gcc/configure @@ -32369,46 +32369,6 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_eh_frame_hdr" >&5 $as_echo "$gcc_cv_ld_eh_frame_hdr" >&6; } -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker CIEv3 in .eh_frame support" >&5 -$as_echo_n "checking linker CIEv3 in .eh_frame support... " >&6; } -gcc_cv_ld_eh_frame_ciev3=no -if test $in_tree_ld = yes ; then - if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 16 -o "$gcc_cv_gld_major_version" -gt 2 \ - && test $in_tree_ld_is_elf = yes; then -gcc_cv_ld_eh_frame_ciev3=yes - fi -elif test x$gcc_cv_ld != x; then - if echo "$ld_ver" | grep GNU > /dev/null; then -gcc_cv_ld_eh_frame_ciev3=yes -if test 0"$ld_date" -lt 20040513; then - if test -n "$ld_date"; then - # If there was date string, but was earlier than 2004-05-13, fail - gcc_cv_ld_eh_frame_ciev3=no - elif test "$ld_vers_major" -lt 2; then - gcc_cv_ld_eh_frame_ciev3=no - elif test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -lt 16; then - gcc_cv_ld_eh_frame_ciev3=no - fi -fi - else -case "$target" in - *-*-solaris2*) -# Sun ld added support for CIE v3 in .eh_frame in Solaris 11.1. -if test "$ld_vers_major" -gt 1 || test "$ld_vers_minor" -ge 2324; then - gcc_cv_ld_eh_frame_ciev3=yes -fi -;; -esac - fi -fi - -cat >>confdefs.h <<_ACEOF -#define HAVE_LD_EH_FRAME_CIEV3 `if test x"$gcc_cv_ld_eh_frame_ciev3" = xyes; then echo 1; else echo 0; fi` -_ACEOF - -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_eh_frame_ciev3" >&5 -$as_echo "$gcc_cv_ld_eh_frame_ciev3" >&6; } - { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker position independent executable support" >&5 $as_echo_n "checking linker position independent executable support... " >&6; } gcc_cv_ld_pie=no diff --git a/gcc/configure.ac b/gcc/configure.ac --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -6110,42 +6110,6 @@ if test x"$gcc_cv_ld_eh_frame_hdr" = xye fi AC_MSG_RESULT($gcc_cv_ld_eh_fram
[PATCH v2 4/8] LoongArch: Simplify {lsx_, lasx_x}vh{add, sub}w description
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove. (UNSPEC_LASX_XVHSUBW_Q_D): Remove. (UNSPEC_LASX_XVHADDW_QU_DU): Remove. (UNSPEC_LASX_XVHSUBW_QU_DU): Remove. (lasx_xvhw_h_b): Remove. (lasx_xvhw_w_h): Remove. (lasx_xvhw_d_w): Remove. (lasx_xvhaddw_q_d): Remove. (lasx_xvhsubw_q_d): Remove. (lasx_xvhaddw_qu_du): Remove. (lasx_xvhsubw_qu_du): Remove. (reduc_plus_scal_v4di): Call gen_lasx_haddw_q_d_punned instead of gen_lasx_xvhaddw_q_d. (reduc_plus_scal_v8si): Likewise. * config/loongarch/lsx.md (UNSPEC_LSX_VHADDW_Q_D): Remove. (UNSPEC_ASX_VHSUBW_Q_D): Remove. (UNSPEC_ASX_VHADDW_QU_DU): Remove. (UNSPEC_ASX_VHSUBW_QU_DU): Remove. (lsx_vhw_h_b): Remove. (lsx_vhw_w_h): Remove. (lsx_vhw_d_w): Remove. (lsx_vhaddw_q_d): Remove. (lsx_vhsubw_q_d): Remove. (lsx_vhaddw_qu_du): Remove. (lsx_vhsubw_qu_du): Remove. (reduc_plus_scal_v2di): Change the temporary register mode to V1TI, and pun the mode calling gen_vec_extractv2didi. (reduc_plus_scal_v4si): Change the temporary register mode to V1TI. * config/loongarch/simd.md (simd_hw__): New define_insn. (_vhw__): New define_expand. (_hw_q_d_punned): New define_expand. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vhaddw_q_d): Define as a macro to override with punned expand. (CODE_FOR_lsx_vhaddw_qu_du): Likewise. (CODE_FOR_lsx_vhsubw_q_d): Likewise. (CODE_FOR_lsx_vhsubw_qu_du): Likewise. (CODE_FOR_lasx_xvhaddw_q_d): Likewise. (CODE_FOR_lasx_xvhaddw_qu_du): Likewise. (CODE_FOR_lasx_xvhsubw_q_d): Likewise. (CODE_FOR_lasx_xvhsubw_qu_du): Likewise. --- gcc/config/loongarch/lasx.md | 126 + gcc/config/loongarch/loongarch-builtins.cc | 10 ++ gcc/config/loongarch/lsx.md| 108 +- gcc/config/loongarch/simd.md | 52 + 4 files changed, 69 insertions(+), 227 deletions(-) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 640fa028f1e..1dc11840187 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -100,10 +100,6 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVMADDWOD UNSPEC_LASX_XVMADDWOD2 UNSPEC_LASX_XVMADDWOD3 - UNSPEC_LASX_XVHADDW_Q_D - UNSPEC_LASX_XVHSUBW_Q_D - UNSPEC_LASX_XVHADDW_QU_DU - UNSPEC_LASX_XVHSUBW_QU_DU UNSPEC_LASX_XVADD_Q UNSPEC_LASX_XVSUB_Q UNSPEC_LASX_XVREPLVE @@ -1407,76 +1403,6 @@ (define_insn "fixuns_trunc2" (set_attr "cnv_mode" "") (set_attr "mode" "")]) -(define_insn "lasx_xvhw_h_b" - [(set (match_operand:V16HI 0 "register_operand" "=f") - (addsub:V16HI - (any_extend:V16HI - (vec_select:V16QI - (match_operand:V32QI 1 "register_operand" "f") - (parallel [(const_int 1) (const_int 3) -(const_int 5) (const_int 7) -(const_int 9) (const_int 11) -(const_int 13) (const_int 15) -(const_int 17) (const_int 19) -(const_int 21) (const_int 23) -(const_int 25) (const_int 27) -(const_int 29) (const_int 31)]))) - (any_extend:V16HI - (vec_select:V16QI - (match_operand:V32QI 2 "register_operand" "f") - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6) -(const_int 8) (const_int 10) -(const_int 12) (const_int 14) -(const_int 16) (const_int 18) -(const_int 20) (const_int 22) -(const_int 24) (const_int 26) -(const_int 28) (const_int 30)])] - "ISA_HAS_LASX" - "xvhw.h.b\t%u0,%u1,%u2" - [(set_attr "type" "simd_int_arith") - (set_attr "mode" "V16HI")]) - -(define_insn "lasx_xvhw_w_h" - [(set (match_operand:V8SI 0 "register_operand" "=f") - (addsub:V8SI - (any_extend:V8SI - (vec_select:V8HI - (match_operand:V16HI 1 "register_operand" "f") - (parallel [(const_int 1) (const_int 3) -(const_int 5) (const_int 7) -(const_int 9) (const_int 11) -(const_int 13) (const_int 15)]))) - (any_extend:V8SI - (vec_select:V8HI - (match_operand:V16HI 2 "register_operand" "f") - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6) -(const_int 8
[PATCH v2 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization
This series is intended to fix some test failures on vect-reduc-chain-*.c by adding the [su]dot_prod* expand for LSX and LASX vector modes. But the code base of the related instructions was not readable, so clean it up first (using the approach learnt from AArch64) before adding the expands. v1 => v2: - Only simplify vpick{ev,od}, not xvpick{ev,od} (where vect_par_cnst_even_or_odd_half is not suitable). - Keep {sign,zero}_extend out of vec_select. - Remove vect_par_cnst_{even,odd}_half for simd_hw__, to simplify the code and allow it to match the RTL in case the even half is selected for the left operand of addsub. Swap the operands if needed when outputting the asm. - Fix typos in commit subjects. - Mention V2TI in loongarch-modes.def. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (8): LoongArch: Try harder using vrepli instructions to materialize const vectors LoongArch: Allow moving TImode vectors LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description LoongArch: Simplify {lsx_,lasx_x}vh{add,sub}w description LoongArch: Simplify {lsx_,lasx_x}vmaddw description LoongArch: Simplify lsx_vpick description LoongArch: Implement vec_widen_mult_{even,odd}_* for LSX and LASX modes LoongArch: Implement [su]dot_prod* for LSX and LASX modes gcc/config/loongarch/constraints.md |2 +- gcc/config/loongarch/lasx.md | 1070 + gcc/config/loongarch/loongarch-builtins.cc| 60 + gcc/config/loongarch/loongarch-modes.def |5 +- gcc/config/loongarch/loongarch-protos.h |3 + gcc/config/loongarch/loongarch.cc | 50 +- gcc/config/loongarch/loongarch.md |2 +- gcc/config/loongarch/lsx.md | 1006 +--- gcc/config/loongarch/predicates.md| 27 + gcc/config/loongarch/simd.md | 390 +- gcc/testsuite/gcc.target/loongarch/vrepli.c | 15 + .../gcc.target/loongarch/wide-mul-reduc-1.c | 18 + .../gcc.target/loongarch/wide-mul-reduc-2.c | 18 + 13 files changed, 612 insertions(+), 2054 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/vrepli.c create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c -- 2.48.1
[PATCH v2 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors
For a = (v4si){0x, 0x, 0x, 0x} we just want vrepli.b $vr0, 0xdd but the compiler actually produces a load: la.local $r14,.LC0 vld $vr0,$r14,0 It's because we only tried vrepli.d which wouldn't work. Try all vrepli instructions for const int vector materializing to fix it. gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_const_vector_vrepli): New function prototype. * config/loongarch/loongarch.cc (loongarch_const_vector_vrepli): Implement. (loongarch_const_insns): Call loongarch_const_vector_vrepli instead of loongarch_const_vector_same_int_p. (loongarch_split_vector_move_p): Likewise. (loongarch_output_move): Use loongarch_const_vector_vrepli to pun operend[1] into a better mode if it's a const int vector, and decide the suffix of [x]vrepli with the new mode. * config/loongarch/constraints.md (YI): Call loongarch_const_vector_vrepli instead of loongarch_const_vector_same_int_p. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vrepli.c: New test. --- gcc/config/loongarch/constraints.md | 2 +- gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch.cc | 34 ++--- gcc/testsuite/gcc.target/loongarch/vrepli.c | 15 + 4 files changed, 46 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/vrepli.c diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md index a7c31c2c4e0..97a4e4e35d3 100644 --- a/gcc/config/loongarch/constraints.md +++ b/gcc/config/loongarch/constraints.md @@ -301,7 +301,7 @@ (define_constraint "YI" A replicated vector const in which the replicated value is in the range [-512,511]." (and (match_code "const_vector") - (match_test "loongarch_const_vector_same_int_p (op, mode, -512, 511)"))) + (match_test "loongarch_const_vector_vrepli (op, mode)"))) (define_constraint "YC" "@internal diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index b99f949a004..20acca690c8 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -121,6 +121,7 @@ extern bool loongarch_const_vector_same_int_p (rtx, machine_mode, extern bool loongarch_const_vector_shuffle_set_p (rtx, machine_mode); extern bool loongarch_const_vector_bitimm_set_p (rtx, machine_mode); extern bool loongarch_const_vector_bitimm_clr_p (rtx, machine_mode); +extern rtx loongarch_const_vector_vrepli (rtx, machine_mode); extern rtx loongarch_lsx_vec_parallel_const_half (machine_mode, bool); extern rtx loongarch_gen_const_int_vector (machine_mode, HOST_WIDE_INT); extern enum reg_class loongarch_secondary_reload_class (enum reg_class, diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index e9978370e8c..e036f802fde 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1846,6 +1846,28 @@ loongarch_const_vector_shuffle_set_p (rtx op, machine_mode mode) return true; } +rtx +loongarch_const_vector_vrepli (rtx x, machine_mode mode) +{ + int size = GET_MODE_SIZE (mode); + + if (GET_CODE (x) != CONST_VECTOR + || GET_MODE_CLASS (mode) != MODE_VECTOR_INT) +return NULL_RTX; + + for (scalar_int_mode elem_mode: {QImode, HImode, SImode, DImode}) +{ + machine_mode new_mode = + mode_for_vector (elem_mode, size / GET_MODE_SIZE (elem_mode)) + .require (); + rtx op = lowpart_subreg (new_mode, x, mode); + if (loongarch_const_vector_same_int_p (op, new_mode, -512, 511)) + return op; +} + + return NULL_RTX; +} + /* Return true if rtx constants of mode MODE should be put into a small data section. */ @@ -2501,7 +2523,7 @@ loongarch_const_insns (rtx x) case CONST_VECTOR: if ((LSX_SUPPORTED_MODE_P (GET_MODE (x)) || LASX_SUPPORTED_MODE_P (GET_MODE (x))) - && loongarch_const_vector_same_int_p (x, GET_MODE (x), -512, 511)) + && loongarch_const_vector_vrepli (x, GET_MODE (x))) return 1; /* Fall through. */ case CONST_DOUBLE: @@ -4656,7 +4678,7 @@ loongarch_split_vector_move_p (rtx dest, rtx src) /* Check for vector set to an immediate const vector with valid replicated element. */ if (FP_REG_RTX_P (dest) - && loongarch_const_vector_same_int_p (src, GET_MODE (src), -512, 511)) + && loongarch_const_vector_vrepli (src, GET_MODE (src))) return false; /* Check for vector load zero immediate. */ @@ -4792,13 +4814,15 @@ loongarch_output_move (rtx *operands) && src_code == CONST_VECTOR && CONST_INT_P (CONST_VECTOR_ELT (src, 0))) { - gcc_assert (loongarch_const_vector_same_int_p (src, mode, -512, 511)); + operands[1] = loongarch_const_vector_vrepli (src, mode); + gcc_assert (operands[1]);
[PATCH v2 2/8] LoongArch: Allow moving TImode vectors
We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX registers so we won't get a reload failure when we start to save TImode vectors in these registers. This implicitly depends on the vrepli optimization: without it we'd try "vrepli.q" which does not really exist and trigger an ICE. gcc/ChangeLog: * config/loongarch/lsx.md (mov): Remove. (movmisalign): Remove. (mov_lsx): Remove. * config/loongarch/lasx.md (mov): Remove. (movmisalign): Remove. (mov_lasx): Remove. * config/loongarch/simd.md (ALLVEC_TI): New mode iterator. (mov): Likewise. (mov_simd): New define_insn_and_split. --- gcc/config/loongarch/lasx.md | 40 -- gcc/config/loongarch/lsx.md | 36 --- gcc/config/loongarch/simd.md | 42 3 files changed, 42 insertions(+), 76 deletions(-) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index a37c85a25a4..d82ad61be60 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -699,46 +699,6 @@ (define_expand "lasx_xvrepli" DONE; }) -(define_expand "mov" - [(set (match_operand:LASX 0) - (match_operand:LASX 1))] - "ISA_HAS_LASX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - - -(define_expand "movmisalign" - [(set (match_operand:LASX 0) - (match_operand:LASX 1))] - "ISA_HAS_LASX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - -;; 256-bit LASX modes can only exist in LASX registers or memory. -(define_insn "mov_lasx" - [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f") - (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))] - "ISA_HAS_LASX" - { return loongarch_output_move (operands); } - [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert") - (set_attr "mode" "") - (set_attr "length" "8,4,4,4,4")]) - - -(define_split - [(set (match_operand:LASX 0 "nonimmediate_operand") - (match_operand:LASX 1 "move_operand"))] - "reload_completed && ISA_HAS_LASX - && loongarch_split_move_p (operands[0], operands[1])" - [(const_int 0)] -{ - loongarch_split_move (operands[0], operands[1]); - DONE; -}) ;; LASX (define_insn "add3" diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index ca0066a21ed..bcc5ae85fb3 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -575,42 +575,6 @@ (define_insn "lsx_vshuf_" [(set_attr "type" "simd_sld") (set_attr "mode" "")]) -(define_expand "mov" - [(set (match_operand:LSX 0) - (match_operand:LSX 1))] - "ISA_HAS_LSX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - -(define_expand "movmisalign" - [(set (match_operand:LSX 0) - (match_operand:LSX 1))] - "ISA_HAS_LSX" -{ - if (loongarch_legitimize_move (mode, operands[0], operands[1])) -DONE; -}) - -(define_insn "mov_lsx" - [(set (match_operand:LSX 0 "nonimmediate_operand" "=f,f,R,*r,*f,*r") - (match_operand:LSX 1 "move_operand" "fYGYI,R,f,*f,*r,*r"))] - "ISA_HAS_LSX" -{ return loongarch_output_move (operands); } - [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert,simd_copy") - (set_attr "mode" "")]) - -(define_split - [(set (match_operand:LSX 0 "nonimmediate_operand") - (match_operand:LSX 1 "move_operand"))] - "reload_completed && ISA_HAS_LSX - && loongarch_split_move_p (operands[0], operands[1])" - [(const_int 0)] -{ - loongarch_split_move (operands[0], operands[1]); - DONE; -}) ;; Integer operations (define_insn "add3" diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 7605b17d21e..61fc1ab20ad 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -130,6 +130,48 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3") ;; instruction here so we can avoid duplicating logics. ;; === + +;; Move + +;; Some immediate values in V1TI or V2TI may be stored in LSX or LASX +;; registers, thus we need to allow moving them for reload. +(define_mode_iterator ALLVEC_TI [ALLVEC +(V1TI "ISA_HAS_LSX") +(V2TI "ISA_HAS_LASX")]) + +(define_expand "mov" + [(set (match_operand:ALLVEC_TI 0) + (match_operand:ALLVEC_TI 1))] + "" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])) +DONE; +}) + +(define_expand "movmisalign" + [(set (match_operand:ALLVEC_TI 0) + (match_operand:ALLVEC_TI 1))] + "" +{ + if (loongarch_legitimize_move (mode, operands[0], operands[1])
Re: [PATCH v4] [ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]
Alexandre Oliva writes: > On Feb 6, 2025, Sam James wrote: > >> Richard Biener writes: >>> On Thu, Feb 6, 2025 at 2:41 PM Alexandre Oliva wrote: On Jan 27, 2025, Richard Biener wrote: > (I see the assert is no longer in the patch). That's because the assert went in as part of an earlier patch. I take it it should be backed out along with the to-be-split-out bits above, right? >>> >>> Yes. >>> >>> (IIRC there's also a PR tripping over this or a similar assert) > >> Right, PR118706. > > Thanks. I've added its testcase to the patch below, reverted the > assert, and dropped the other unwanted bits. Regstrapped on > x86_64-linux-gnu. Ok to install? Thanks. BTW, there's another for you at PR118805 (sorry). > > > > If decode_field_reference finds a load that accesses past the inner > object's size, bail out. > > Drop the too-strict assert. > > > for gcc/ChangeLog > > PR tree-optimization/118514 > PR tree-optimization/118706 > * gimple-fold.cc (decode_field_reference): Refuse to consider > merging out-of-bounds BIT_FIELD_REFs. > (make_bit_field_load): Drop too-strict assert. > * tree-eh.cc (bit_field_ref_in_bounds_p): Rename to... > (access_in_bounds_of_type_p): ... this. Change interface, > export. > (tree_could_trap_p): Adjust. > * tree-eh.h (access_in_bounds_of_type_p): Declare. > > for gcc/testsuite/ChangeLog > > PR tree-optimization/118514 > PR tree-optimization/118706 > * gcc.dg/field-merge-25.c: New. > --- > gcc/gimple-fold.cc| 11 ++- > gcc/testsuite/gcc.dg/field-merge-25.c | 15 +++ > gcc/tree-eh.cc| 25 + > gcc/tree-eh.h |1 + > 4 files changed, 31 insertions(+), 21 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/field-merge-25.c > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index 45485782cdf91..29191685a43c5 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -7686,10 +7686,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT > *pbitsize, >|| bs <= shiftrt >|| offset != 0 >|| TREE_CODE (inner) == PLACEHOLDER_EXPR > - /* Reject out-of-bound accesses (PR79731). */ > - || (! AGGREGATE_TYPE_P (TREE_TYPE (inner)) > - && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)), > -bp + bs) < 0) > + /* Reject out-of-bound accesses (PR79731, PR118514). */ > + || !access_in_bounds_of_type_p (TREE_TYPE (inner), bs, bp) >|| (INTEGRAL_TYPE_P (TREE_TYPE (inner)) > && !type_has_mode_precision_p (TREE_TYPE (inner > return NULL_TREE; > @@ -7859,11 +7857,6 @@ make_bit_field_load (location_t loc, tree inner, tree > orig_inner, tree type, >gimple *new_stmt = gsi_stmt (i); >if (gimple_has_mem_ops (new_stmt)) > gimple_set_vuse (new_stmt, reaching_vuse); > - gcc_checking_assert (! (gimple_assign_load_p (point) > - && gimple_assign_load_p (new_stmt)) > -|| (tree_could_trap_p (gimple_assign_rhs1 (point)) > -== tree_could_trap_p (gimple_assign_rhs1 > - (new_stmt; > } > >gimple_stmt_iterator gsi = gsi_for_stmt (point); > diff --git a/gcc/testsuite/gcc.dg/field-merge-25.c > b/gcc/testsuite/gcc.dg/field-merge-25.c > new file mode 100644 > index 0..e769b0ae7b846 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/field-merge-25.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fno-tree-fre" } */ > + > +/* PR tree-optimization/118706 */ > + > +int a[1][1][3], b; > +int main() { > + int c = -1; > + while (b) { > +if (a[c][c][6]) > + break; > +if (a[0][0][0]) > + break; > + } > +} > diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc > index 7015189a2de83..a4d59954c0597 100644 > --- a/gcc/tree-eh.cc > +++ b/gcc/tree-eh.cc > @@ -2646,24 +2646,22 @@ range_in_array_bounds_p (tree ref) >return true; > } > > -/* Return true iff EXPR, a BIT_FIELD_REF, accesses a bit range that is known > to > - be in bounds for the referred operand type. */ > +/* Return true iff a BIT_FIELD_REF <(TYPE)???, SIZE, OFFSET> would access a > bit > + range that is known to be in bounds for TYPE. */ > > -static bool > -bit_field_ref_in_bounds_p (tree expr) > +bool > +access_in_bounds_of_type_p (tree type, poly_uint64 size, poly_uint64 offset) > { > - tree size_tree; > - poly_uint64 size_max, min, wid, max; > + tree type_size_tree; > + poly_uint64 type_size_max, min = offset, wid = size, max; > > - size_tree = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (expr, 0))); > - if (!size_tree || !poly_int_tree_p (size_tree, &size_max)) > + type_size_tree = TYPE_SIZE (type); > + if (!type_size_tree || !poly_int_tree_p (type_size_tree, &type_size_max)) > retu
[PATCH, FYI] [testsuite] fix check-function-bodies usage
The existing usage comment for check-function-bodies is presumably a typo, as it doesn't match existing uses. Fix it. Tested on x86_64-linux-gnu. I'm going to install it as obvious if there are no objections in the next 24 hours. for gcc/testsuite/ChangeLog * lib/scanasm.exp (check-function-bodies): Fix usage comment. --- gcc/testsuite/lib/scanasm.exp |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp index beffedd5bce46..97935cb23c3cf 100644 --- a/gcc/testsuite/lib/scanasm.exp +++ b/gcc/testsuite/lib/scanasm.exp @@ -985,7 +985,7 @@ proc check_function_body { functions name body_regexp } { # Check the implementations of functions against expected output. Used as: # -# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR [MATCHED]]] } } +# { dg-final { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR [MATCHED]]] } } # # See sourcebuild.texi for details. -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
[PATCH v2 6/8] LoongArch: Simplify lsx_vpick description
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. This is not suitable for LASX where lasx_xvpick has a different semantic. gcc/ChangeLog: * config/loongarch/simd.md (LVEC): New define_mode_attr. (simdfmt_as_i): Make it same as simdfmt for integer vector modes. (_f): New define_mode_attr. * config/loongarch/lsx.md (lsx_vpickev_b): Remove. (lsx_vpickev_h): Remove. (lsx_vpickev_w): Remove. (lsx_vpickev_w_f): Remove. (lsx_vpickod_b): Remove. (lsx_vpickod_h): Remove. (lsx_vpickod_w): Remove. (lsx_vpickev_w_f): Remove. (lsx_pick_evod_): New define_insn. (lsx_vpick_<_f>): New define_expand. --- gcc/config/loongarch/lsx.md | 142 ++- gcc/config/loongarch/simd.md | 24 +- 2 files changed, 47 insertions(+), 119 deletions(-) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index c7df04c6389..9d7254768ae 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1624,125 +1624,33 @@ (define_insn "lsx_nor_" [(set_attr "type" "simd_logic") (set_attr "mode" "")]) -(define_insn "lsx_vpickev_b" -[(set (match_operand:V16QI 0 "register_operand" "=f") - (vec_select:V16QI - (vec_concat:V32QI - (match_operand:V16QI 1 "register_operand" "f") - (match_operand:V16QI 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) - (const_int 4) (const_int 6) - (const_int 8) (const_int 10) - (const_int 12) (const_int 14) - (const_int 16) (const_int 18) - (const_int 20) (const_int 22) - (const_int 24) (const_int 26) - (const_int 28) (const_int 30)])))] - "ISA_HAS_LSX" - "vpickev.b\t%w0,%w2,%w1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V16QI")]) - -(define_insn "lsx_vpickev_h" -[(set (match_operand:V8HI 0 "register_operand" "=f") - (vec_select:V8HI - (vec_concat:V16HI - (match_operand:V8HI 1 "register_operand" "f") - (match_operand:V8HI 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) - (const_int 4) (const_int 6) - (const_int 8) (const_int 10) - (const_int 12) (const_int 14)])))] - "ISA_HAS_LSX" - "vpickev.h\t%w0,%w2,%w1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V8HI")]) - -(define_insn "lsx_vpickev_w" -[(set (match_operand:V4SI 0 "register_operand" "=f") - (vec_select:V4SI - (vec_concat:V8SI - (match_operand:V4SI 1 "register_operand" "f") - (match_operand:V4SI 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) - (const_int 4) (const_int 6)])))] - "ISA_HAS_LSX" - "vpickev.w\t%w0,%w2,%w1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V4SI")]) - -(define_insn "lsx_vpickev_w_f" -[(set (match_operand:V4SF 0 "register_operand" "=f") - (vec_select:V4SF - (vec_concat:V8SF - (match_operand:V4SF 1 "register_operand" "f") - (match_operand:V4SF 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) - (const_int 4) (const_int 6)])))] - "ISA_HAS_LSX" - "vpickev.w\t%w0,%w2,%w1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V4SF")]) - -(define_insn "lsx_vpickod_b" -[(set (match_operand:V16QI 0 "register_operand" "=f") - (vec_select:V16QI - (vec_concat:V32QI - (match_operand:V16QI 1 "register_operand" "f") - (match_operand:V16QI 2 "register_operand" "f")) - (parallel [(const_int 1) (const_int 3) - (const_int 5) (const_int 7) - (const_int 9) (const_int 11) - (const_int 13) (const_int 15) - (const_int 17) (const_int 19) - (const_int 21) (const_int 23) - (const_int 25) (const_int 27) - (const_int 29) (const_int 31)])))] - "ISA_HAS_LSX" - "vpickod.b\t%w0,%w2,%w1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V16QI")]) - -(define_insn "lsx_vpickod_h" -[(set (match_operand:V8HI 0 "register_operand" "=f") - (vec_select:V8HI - (vec_concat:V16HI - (match_operand:V8HI 1 "register_operand" "f") - (match_operand:V8HI 2 "register_operand" "f")) - (parallel [(const_int 1) (const_int 3) - (const_int 5) (const_int 7) - (const_int 9) (const_int 11) - (const_int 13) (const_int 15)])))] - "ISA_HAS_LSX" - "vpickod.h\t%w0,%w2,%w1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V8HI")]) - -(define_insn "lsx_vpickod_w" -[(set (match_operand:V4SI 0 "register_operand" "=f") - (vec_select:V4SI - (vec_concat:V8SI - (match_operand:V4SI 1 "register_operand" "f") -
[PATCH v2 5/8] LoongArch: Simplify {lsx_,lasx_x}vmaddw description
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the template, so combine will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVMADDWEV): Remove. (UNSPEC_LASX_XVMADDWEV2): Remove. (UNSPEC_LASX_XVMADDWEV3): Remove. (UNSPEC_LASX_XVMADDWOD): Remove. (UNSPEC_LASX_XVMADDWOD2): Remove. (UNSPEC_LASX_XVMADDWOD3): Remove. (lasx_xvmaddwev_h_b): Remove. (lasx_xvmaddwev_w_h): Remove. (lasx_xvmaddwev_d_w): Remove. (lasx_xvmaddwev_q_d): Remove. (lasx_xvmaddwod_h_b): Remove. (lasx_xvmaddwod_w_h): Remove. (lasx_xvmaddwod_d_w): Remove. (lasx_xvmaddwod_q_d): Remove. (lasx_xvmaddwev_q_du): Remove. (lasx_xvmaddwod_q_du): Remove. (lasx_xvmaddwev_h_bu_b): Remove. (lasx_xvmaddwev_w_hu_h): Remove. (lasx_xvmaddwev_d_wu_w): Remove. (lasx_xvmaddwev_q_du_d): Remove. (lasx_xvmaddwod_h_bu_b): Remove. (lasx_xvmaddwod_w_hu_h): Remove. (lasx_xvmaddwod_d_wu_w): Remove. (lasx_xvmaddwod_q_du_d): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VMADDWEV): Remove. (UNSPEC_LSX_VMADDWEV2): Remove. (UNSPEC_LSX_VMADDWEV3): Remove. (UNSPEC_LSX_VMADDWOD): Remove. (UNSPEC_LSX_VMADDWOD2): Remove. (UNSPEC_LSX_VMADDWOD3): Remove. (lsx_vmaddwev_h_b): Remove. (lsx_vmaddwev_w_h): Remove. (lsx_vmaddwev_d_w): Remove. (lsx_vmaddwev_q_d): Remove. (lsx_vmaddwod_h_b): Remove. (lsx_vmaddwod_w_h): Remove. (lsx_vmaddwod_d_w): Remove. (lsx_vmaddwod_q_d): Remove. (lsx_vmaddwev_q_du): Remove. (lsx_vmaddwod_q_du): Remove. (lsx_vmaddwev_h_bu_b): Remove. (lsx_vmaddwev_w_hu_h): Remove. (lsx_vmaddwev_d_wu_w): Remove. (lsx_vmaddwev_q_du_d): Remove. (lsx_vmaddwod_h_bu_b): Remove. (lsx_vmaddwod_w_hu_h): Remove. (lsx_vmaddwod_d_wu_w): Remove. (lsx_vmaddwod_q_du_d): Remove. * config/loongarch/simd.md (simd_maddw_evod__): New define_insn. (_vmaddw__): New define_expand. (simd_maddw_evod__hetero): New define_insn. (_vmaddw__u_): New define_expand. (_maddw_q_d_punned): New define_expand. (_maddw_q_du_d_punned): New define_expand. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmaddwev_q_d): Define as a macro to override it with the punned expand. (CODE_FOR_lsx_vmaddwev_q_du): Likewise. (CODE_FOR_lsx_vmaddwev_q_du_d): Likewise. (CODE_FOR_lsx_vmaddwod_q_d): Likewise. (CODE_FOR_lsx_vmaddwod_q_du): Likewise. (CODE_FOR_lsx_vmaddwod_q_du_d): Likewise. (CODE_FOR_lasx_xvmaddwev_q_d): Likewise. (CODE_FOR_lasx_xvmaddwev_q_du): Likewise. (CODE_FOR_lasx_xvmaddwev_q_du_d): Likewise. (CODE_FOR_lasx_xvmaddwod_q_d): Likewise. (CODE_FOR_lasx_xvmaddwod_q_du): Likewise. (CODE_FOR_lasx_xvmaddwod_q_du_d): Likewise. --- gcc/config/loongarch/lasx.md | 400 - gcc/config/loongarch/loongarch-builtins.cc | 14 + gcc/config/loongarch/lsx.md| 320 - gcc/config/loongarch/simd.md | 104 ++ 4 files changed, 118 insertions(+), 720 deletions(-) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 1dc11840187..4ac85b7fcf9 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -94,12 +94,6 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVPERMI_Q UNSPEC_LASX_XVPERMI_D - UNSPEC_LASX_XVMADDWEV - UNSPEC_LASX_XVMADDWEV2 - UNSPEC_LASX_XVMADDWEV3 - UNSPEC_LASX_XVMADDWOD - UNSPEC_LASX_XVMADDWOD2 - UNSPEC_LASX_XVMADDWOD3 UNSPEC_LASX_XVADD_Q UNSPEC_LASX_XVSUB_Q UNSPEC_LASX_XVREPLVE @@ -3122,400 +3116,6 @@ (define_insn "lasx_xvldrepl__insn_0" (set_attr "mode" "") (set_attr "length" "4")]) -;;XVMADDWEV.H.B XVMADDWEV.H.BU -(define_insn "lasx_xvmaddwev_h_b" - [(set (match_operand:V16HI 0 "register_operand" "=f") - (plus:V16HI - (match_operand:V16HI 1 "register_operand" "0") - (mult:V16HI - (any_extend:V16HI - (vec_select:V16QI - (match_operand:V32QI 2 "register_operand" "%f") - (parallel [(const_int 0) (const_int 2) - (const_int 4) (const_int 6) - (const_int 8) (const_int 10) - (const_int 12) (const_int 14) - (const_int 16) (const_int 18) - (const_int 20) (const_int 22) - (const_int 24) (const_int 26) - (const_int 28) (const_int 30)]))) -
Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description
On Thu, 2025-02-13 at 17:01 +0800, Lulu Cheng wrote: > Hi, Ruoyao: > > When will it be convenient for you to submit the v2 version of the > patch? https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675672.html > > I am planning to merge the current patches and then test the optimal > values > > for -malign-{functions,labels,jumps,loops} on that basis. Thanks! -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] gcc: testsuite: Fix builtin-speculation-overloads[14].C testism
On 2/12/25 23:30, Jason Merrill wrote: External email: Use caution opening links or attachments In the new `check_known_compiler_messages_nocache` procedure I use some Why is it not enough to look for the message with "[regexp" like check_alias_available does? Jason The goal was that I wanted to be able to query "the warnings/errors are *only* about this thing", rather than "warnings mention this thing". That said, since my use-case here is to give a boolean, the hypothetical case of "extra" messages has to be categorised in one or the other bucket. Since the final behaviour would be much the same -- possible "excess error" messages on targets which support __builtin_speculation_safe_value instead of on targets which don't -- a simple `regexp` would work for this patch just as well. Shall I make that change?
Re: [PATCH] tree-optimization/86270 - improve SSA coalescing for loop exit test
On Thu, 13 Feb 2025, Richard Biener wrote: > On Wed, 12 Feb 2025, Andrew Pinski wrote: > > > On Wed, Feb 12, 2025 at 4:04 AM Richard Biener wrote: > > > > > > The PR indicates a very specific issue with regard to SSA coalescing > > > failures because there's a pre IV increment loop exit test. While > > > IVOPTs created the desired IL we later simplify the exit test into > > > the undesirable form again. The following fixes this up during RTL > > > expansion where we try to improve coalescing of IVs. That seems > > > easier that trying to avoid the simplification with some weird > > > heuristics (it could also have been written this way). > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > OK for trunk? > > > > > > Thanks, > > > Richard. > > > > > > PR tree-optimization/86270 > > > * tree-outof-ssa.cc (insert_backedge_copies): Pattern > > > match a single conflict in a loop condition and adjust > > > that avoiding the conflict if possible. > > > > > > * gcc.target/i386/pr86270.c: Adjust to check for no reg-reg > > > copies as well. > > > --- > > > gcc/testsuite/gcc.target/i386/pr86270.c | 3 ++ > > > gcc/tree-outof-ssa.cc | 49 ++--- > > > 2 files changed, 47 insertions(+), 5 deletions(-) > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c > > > b/gcc/testsuite/gcc.target/i386/pr86270.c > > > index 68562446fa4..89b9aeb317a 100644 > > > --- a/gcc/testsuite/gcc.target/i386/pr86270.c > > > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c > > > @@ -13,3 +13,6 @@ test () > > > > > > /* Check we do not split the backedge but keep nice loop form. */ > > > /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */ > > > +/* Check we do not end up with reg-reg moves from a pre-increment IV > > > + exit test. */ > > > +/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, > > > %\?\[er\].x" } } */ > > > diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc > > > index d340d4ba529..f285c81599e 100644 > > > --- a/gcc/tree-outof-ssa.cc > > > +++ b/gcc/tree-outof-ssa.cc > > > @@ -1259,10 +1259,9 @@ insert_backedge_copies (void) > > > if (gimple_nop_p (def) > > > || gimple_code (def) == GIMPLE_PHI) > > > continue; > > > - tree name = copy_ssa_name (result); > > > - gimple *stmt = gimple_build_assign (name, result); > > > imm_use_iterator imm_iter; > > > gimple *use_stmt; > > > + auto_vec uses; > > > /* The following matches trivially_conflicts_p. */ > > > FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result) > > > { > > > @@ -1273,11 +1272,51 @@ insert_backedge_copies (void) > > > { > > > use_operand_p use; > > > FOR_EACH_IMM_USE_ON_STMT (use, imm_iter) > > > - SET_USE (use, name); > > > + uses.safe_push (use); > > > } > > > } > > > - gimple_stmt_iterator gsi = gsi_for_stmt (def); > > > - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); > > > + /* When there is just a conflicting statement try to > > > +adjust that to refer to the new definition. > > > +In particular for now handle a conflict with the > > > +use in a (exit) condition with a NE compare, > > > +replacing a pre-IV-increment compare with a > > > +post-IV-increment one. */ > > > + if (uses.length () == 1 > > > + && is_a (USE_STMT (uses[0])) > > > + && gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR > > > + && is_gimple_assign (def) > > > + && gimple_assign_rhs1 (def) == result > > > + && (gimple_assign_rhs_code (def) == PLUS_EXPR > > > + || gimple_assign_rhs_code (def) == MINUS_EXPR > > > + || gimple_assign_rhs_code (def) == > > > POINTER_PLUS_EXPR) > > > + && TREE_CODE (gimple_assign_rhs2 (def)) == > > > INTEGER_CST) > > > + { > > > + gcond *cond = as_a (USE_STMT (uses[0])); > > > + tree *adj; > > > + if (gimple_cond_lhs (cond) == result) > > > + adj = gimple_cond_rhs_ptr (cond); > > > + else > > > + adj = gimple_cond_lhs_ptr (cond); > > > + tree name = copy_ssa_name (result); > > > > Should this be `copy_ssa_name (*adj)`? Since the new name is based on > > `*adj` rather than based on the result. > > Good point, I've adjusted this in my local copy. Ah, but i
[PATCH][v2] tree-optimization/86270 - improve SSA coalescing for loop exit test
The PR indicates a very specific issue with regard to SSA coalescing failures because there's a pre IV increment loop exit test. While IVOPTs created the desired IL we later simplify the exit test into the undesirable form again. The following fixes this up during RTL expansion where we try to improve coalescing of IVs. That seems easier that trying to avoid the simplification with some weird heuristics (it could also have been written this way). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. OK? Thanks, Richard. PR tree-optimization/86270 * tree-outof-ssa.cc (insert_backedge_copies): Pattern match a single conflict in a loop condition and adjust that avoiding the conflict if possible. * gcc.target/i386/pr86270.c: Adjust to check for no reg-reg copies as well. --- gcc/testsuite/gcc.target/i386/pr86270.c | 3 ++ gcc/tree-outof-ssa.cc | 51 ++--- 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c b/gcc/testsuite/gcc.target/i386/pr86270.c index 68562446fa4..89b9aeb317a 100644 --- a/gcc/testsuite/gcc.target/i386/pr86270.c +++ b/gcc/testsuite/gcc.target/i386/pr86270.c @@ -13,3 +13,6 @@ test () /* Check we do not split the backedge but keep nice loop form. */ /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */ +/* Check we do not end up with reg-reg moves from a pre-increment IV + exit test. */ +/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, %\?\[er\].x" } } */ diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc index d340d4ba529..1b5b67c2e2b 100644 --- a/gcc/tree-outof-ssa.cc +++ b/gcc/tree-outof-ssa.cc @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-outof-ssa.h" #include "dojump.h" #include "internal-fn.h" +#include "gimple-fold.h" /* FIXME: A lot of code here deals with expanding to RTL. All that code should be in cfgexpand.cc. */ @@ -1259,10 +1260,9 @@ insert_backedge_copies (void) if (gimple_nop_p (def) || gimple_code (def) == GIMPLE_PHI) continue; - tree name = copy_ssa_name (result); - gimple *stmt = gimple_build_assign (name, result); imm_use_iterator imm_iter; gimple *use_stmt; + auto_vec uses; /* The following matches trivially_conflicts_p. */ FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result) { @@ -1273,11 +1273,52 @@ insert_backedge_copies (void) { use_operand_p use; FOR_EACH_IMM_USE_ON_STMT (use, imm_iter) - SET_USE (use, name); + uses.safe_push (use); } } - gimple_stmt_iterator gsi = gsi_for_stmt (def); - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + /* When there is just a conflicting statement try to +adjust that to refer to the new definition. +In particular for now handle a conflict with the +use in a (exit) condition with a NE compare, +replacing a pre-IV-increment compare with a +post-IV-increment one. */ + if (uses.length () == 1 + && is_a (USE_STMT (uses[0])) + && (gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR + || gimple_cond_code (USE_STMT (uses[0])) == EQ_EXPR) + && is_gimple_assign (def) + && gimple_assign_rhs1 (def) == result + && (gimple_assign_rhs_code (def) == PLUS_EXPR + || gimple_assign_rhs_code (def) == MINUS_EXPR + || gimple_assign_rhs_code (def) == POINTER_PLUS_EXPR) + && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST) + { + gcond *cond = as_a (USE_STMT (uses[0])); + tree *adj; + if (gimple_cond_lhs (cond) == result) + adj = gimple_cond_rhs_ptr (cond); + else + adj = gimple_cond_lhs_ptr (cond); + gimple_stmt_iterator gsi = gsi_for_stmt (cond); + tree newval + = gimple_build (&gsi, true, GSI_SAME_STMT, + UNKNOWN_LOCATION, + gimple_assign_rhs_code (def), + TREE_TYPE (*adj), + *adj, gimple_assign_rhs2 (def)); + *adj = newval; + SET_USE (uses[0], arg); + update_stmt (cond); + } +
Re: [PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]
On Thu, 13 Feb 2025, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs, because we have multiple levels of > DECL_VALUE_EXPR VAR_DECLs: > character(kind=1) id_string[1:.id_string] [value-expr: *id_string.55]; > character(kind=1)[1:.id_string] * id_string.55 [value-expr: > FRAME.107.id_string.55]; > integer(kind=8) .id_string [value-expr: FRAME.107..id_string]; > id_string is the user variable mentioned in BLOCK_VARS, it has > DECL_VALUE_EXPR because it is a VLA, id_string.55 is a temporary created by > gimplify_vla_decl as the address that points to the start of the VLA, what > is normally used in the IL to access it. But as this artificial var is then > used inside of a nested function, tree-nested.cc adds DECL_VALUE_EXPR to it > too and moves the actual value into the FRAME.107 object's member. > Now, remove_unused_locals removes id_string.55 (and various other VAR_DECLs) > from cfun->local_decls, simply because it is not mentioned in the IL at all > (neither is id_string itself, but that is kept in BLOCK_VARS as it has > DECL_VALUE_EXPR). So, after this point, id_string.55 tree isn't referenced > from > anywhere but id_string's DECL_VALUE_EXPR. Next GC collection is triggered, > and we are unlucky enough that in the value_expr_for_decl hash table > (underlying hash map for DECL_VALUE_EXPR) the id_string.55 entry comes > before the id_string entry. id_string is ggc_marked_p because it is > referenced from BLOCK_VARS, but id_string.55 is not, as we don't mark > DECL_VALUE_EXPR anywhere but by gt_cleare_cache on value_expr_for_decl. > But gt_cleare_cache does two things, it calls clear_slots on entries > where the key is not ggc_marked_p (so the id_string.55 mapping to > FRAME.107.id_string.55 is lost and DECL_VALUE_EXPR (id_string.55) becomes > NULL) but then later we see id_string entry, which is ggc_marked_p, so mark > the whole hash table entry, which sets ggc_set_mark on id_string.55. But > at this point its DECL_VALUE_EXPR is lost. > Later during dwarf2out.cc we want to emit DW_AT_location for id_string, see > it has DECL_VALUE_EXPR, so emit it as indirection of id_string.55 for which > we again lookup DECL_VALUE_EXPR as it has DECL_HAS_VALUE_EXPR_P, but as it > is NULL, we ICE, instead of finding it is a subobject of FRAME.107 for which > we can find its stack location. > > Now, as can be seen in the PR, I've tried to tweak tree-ssa-live.cc so that > it would keep id_string.55 in cfun->local_decls; that prohibits it from > the DECL_VALUE_EXPR of it being GC until expansion, but then we shrink and > free cfun->local_decls completely and so GC at that point still can throw > it away. > > The following patch adds an extension to the GTY ((cache)) option, before > calling the gt_cleare_cache on some hash table by specifying > GTY ((cache ("somefn"))) it calls somefn on that hash table as well. > And this extra hook can do any additional ggc_set_mark needed so that > gt_cleare_cache preserves everything that is actually needed and throws > away the rest. > > In order to make it just 2 pass rather than up to n passes - (if we had > say > id1 -> something, id2 -> x(id1), id3 -> x(id2), id4 -> x(id3), id5 -> x(id4) > in the value_expr_for_decl hash table in that order (where idN are VAR_DECLs > with DECL_HAS_VALUE_EXPR_P, id5 is the only one mentioned from outside and > idN -> X stands for idN having DECL_VALUE_EXPR X, something for some > arbitrary tree and x(idN) for some arbitrary tree which mentions idN > variable) and in each pass just marked the to part of entries with > ggc_marked_p base.from we'd need to repeat until we don't mark anything) > the patch calls walk_tree on DECL_VALUE_EXPR of the marked trees and if it > finds yet unmarked tree, it marks it and walks its DECL_VALUE_EXPR as well > the same way. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? So what this basically does is ensure we mark DECL_VALUE_EXPR when VAR is marked which isn't done when marking a tree node. That you special-case the hashtable walker is a workaround for us not being able to say struct GTY((mark_extra_stuff)) tree_decl_with_vis { on 'tree' (or specifically the structs for a VAR_DECL). And that we rely on gengtype producing the 'tree' marker. So we rely on the hashtable keeping referenced trees live. OK. Thanks, Richard. > 2025-02-13 Jakub Jelinek > > PR debug/118790 > * gengtype.cc (write_roots): Remove cache variable, instead break from > the loop on match and test o for NULL. If the cache option has > non-empty string argument, call the specified function with v->name > as argument before calling gt_cleare_cache on it. > * tree.cc (gt_value_expr_mark_2, gt_value_expr_mark_1, > gt_value_expr_mark): New functions. > (value_expr_for_decl): Use GTY ((cache ("gt_value_expr_mark"))) rather > than just GTY ((cache)). > * doc/gty.texi (cache): Document optional argument of cache option. > >
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/2025 4:12 AM, Vineet Gupta wrote: On 2/13/25 14:17, Robin Dapp wrote: Other thoughts? The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff we can't/don't model in the pipeline, but I have no idea how to model the VL=0 case there. Maybe so, but what Edwin is doing looks sensible enough. It wouldn't be the first time a hook got (ab)used in ways that weren't part of the original intent. I don't fully understand what's happening. So the hoisting is being done speculatively here? And it just happens to be "bad" because that might cause a VL=0 case. But are we sure a lack of speculation cannot cause such cases? Exactly. My gut feeling w/o deep dive was this seemed like papering over the issue. BTW what exactly is speculative scheduling ? As in what is it actually trying to schedule ahead ? Also, why doesn't the vsetvl pass fix the situation? IMHO we need to understand the problem more thoroughly before changing things. In the end LCM minimizes the number of vsetvls and inserts them at the "earliest" point. If that is not sufficient I'd say we need modify the constraints (maybe on a per-uarch basis)? As far as LCM is concerned it is hoisting the insn to the optimal spot. However there's some additional logic such as in can_use_next_avl_p () which influences if things can be moved around. Since sched1 put the vsetvl right before the branch, that was always determined to be the "earliest" point because it was now available on all outgoing edges. Without the vsetvl right before the branch, the "earliest" point to insert the vsetvls was determined to be the beginning of each basic block. I did try adding some additional logic to adjust the way vsetvl fusion occurs across basic blocks in these scenarios i.e. performing the fusion in the opposite manner (breaking lcm guarantees); however, from my testing, fusing two vsetvls didn't actually remove the fused expression from the vinfo list. I'm not sure if that's intended but as a result, phase 3 would remove the fused block and use the vinfo that should've been fused into the other. That won't help with the problem here but might with others. Right this needs to be evaluated independently with both icounts and BPI3 runs to see if anything falls out. -Vineet I'll add an opt flag to gate this for testing purposes. Edwin
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 11:57 AM, Robin Dapp wrote: I did try adding some additional logic to adjust the way vsetvl fusion occurs across basic blocks in these scenarios i.e. performing the fusion in the opposite manner (breaking lcm guarantees); however, from my testing, fusing two vsetvls didn't actually remove the fused expression from the vinfo list. I'm not sure if that's intended but as a result, phase 3 would remove the fused block and use the vinfo that should've been fused into the other. It depends on the specific example but keeping deleted vsetvls/infos around has a purpose because it helps delete other vsetvls still. I don't recall details but I remember having at least a few examples for it. Yea, that can certainly happen with LCM based algorithms when computing the availability and anticipatable sets. Jeff
Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]
On Tue, Nov 26, 2024 at 05:35:50PM -0500, Marek Polacek wrote: > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > -- >8 -- > As the manual states, using "-fhardened -fstack-protector" will produce > a warning because -fhardened wants to enable -fstack-protector-strong, > but it can't since it's been overriden by the weaker -fstack-protector. > > -fhardened also attempts to enable -Wl,-z,relro,-z,now. By the same > logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should > produce the same warning. But we don't detect this combination, so > this patch fixes it. I also renamed a variable to better reflect its > purpose. > > Also don't check warn_hardened in process_command, since it's always > true there. > > Also tweak wording in the manual as Jon Wakely suggested on IRC. > > PR driver/117739 > > gcc/ChangeLog: > > * doc/invoke.texi: Tweak wording for -Whardened. > * gcc.cc (driver_handle_option): If -z lazy or -z norelro was > specified, don't enable linker hardening. > (process_command): Don't check warn_hardened. > > gcc/testsuite/ChangeLog: > > * c-c++-common/fhardened-16.c: New test. > * c-c++-common/fhardened-17.c: New test. > * c-c++-common/fhardened-18.c: New test. > * c-c++-common/fhardened-19.c: New test. > * c-c++-common/fhardened-20.c: New test. > * c-c++-common/fhardened-21.c: New test. LGTM. Jakub
[PATCH] tree: Fix up the DECL_VALUE_EXPR GC marking [PR118790]
Hi! The ggc_set_mark call in gt_value_expr_mark_2 is actually wrong, that just marks the VAR_DECL itself, but doesn't mark the subtrees of it (type etc.). So, I think we need to test gcc_marked_p for whether it is marked or not, if not marked walk the DECL_VALUE_EXPR and then gt_ggc_mx mark the VAR_DECL that was determined not marked and needs to be marked now. One option would be to call gt_ggc_mx (t) right after the DECL_VALUE_EXPR walking, but I'm a little bit worried that the subtree marking could mark other VAR_DECLs (e.g. seen from DECL_SIZE or TREE_TYPE and the like) and if they would be DECL_HAS_VALUE_EXPR_P we might not walk their DECL_VALUE_EXPR anymore later. So, the patch defers the gt_ggc_mx calls until we've walked all the DECL_VALUE_EXPRs directly or indirectly connected to already marked VAR_DECLs. Ok for trunk if this passes bootstrap/regtest? 2025-02-13 Jakub Jelinek PR debug/118790 * tree.cc (struct gt_value_expr_mark_data): New type. (gt_value_expr_mark_2): Don't call ggc_set_mark, instead check ggc_marked_p. Treat data as gt_value_expr_mark_data * with pset in it rather than address of the pset itself and push to be marked VAR_DECLs into to_mark vec. (gt_value_expr_mark_1): Change argument from hash_set * to gt_value_expr_mark_data * and find pset in it. (gt_value_expr_mark): Pass to traverse_noresize address of gt_value_expr_mark_data object rather than hash_table and for all entries in the to_mark vector after the traversal call gt_ggc_mx. --- gcc/tree.cc.jj 2025-02-13 14:14:44.330394074 +0100 +++ gcc/tree.cc 2025-02-13 16:24:39.609106712 +0100 @@ -211,6 +211,11 @@ struct cl_option_hasher : ggc_cache_ptr_ static GTY ((cache)) hash_table *cl_option_hash_table; +struct gt_value_expr_mark_data { + hash_set pset; + auto_vec to_mark; +}; + /* Callback called through walk_tree_1 to discover DECL_HAS_VALUE_EXPR_P VAR_DECLs which weren't marked yet, in that case marks them and walks their DECL_VALUE_EXPR expressions. */ @@ -219,11 +224,12 @@ static tree gt_value_expr_mark_2 (tree *tp, int *, void *data) { tree t = *tp; - if (VAR_P (t) && DECL_HAS_VALUE_EXPR_P (t) && !ggc_set_mark (t)) + if (VAR_P (t) && DECL_HAS_VALUE_EXPR_P (t) && !ggc_marked_p (t)) { tree dve = DECL_VALUE_EXPR (t); - walk_tree_1 (&dve, gt_value_expr_mark_2, data, - (hash_set *) data, NULL); + gt_value_expr_mark_data *d = (gt_value_expr_mark_data *) data; + walk_tree_1 (&dve, gt_value_expr_mark_2, data, &d->pset, NULL); + d->to_mark.safe_push (t); } return NULL_TREE; } @@ -232,10 +238,10 @@ gt_value_expr_mark_2 (tree *tp, int *, v value_expr_for_decl hash table. */ int -gt_value_expr_mark_1 (tree_decl_map **e, hash_set *pset) +gt_value_expr_mark_1 (tree_decl_map **e, gt_value_expr_mark_data *data) { if (ggc_marked_p ((*e)->base.from)) -walk_tree_1 (&(*e)->to, gt_value_expr_mark_2, pset, pset, NULL); +walk_tree_1 (&(*e)->to, gt_value_expr_mark_2, data, &data->pset, NULL); return 1; } @@ -255,8 +261,11 @@ gt_value_expr_mark (hash_table pset; - h->traverse_noresize *, gt_value_expr_mark_1> (&pset); + gt_value_expr_mark_data data; + h->traverse_noresize (&data); + for (auto v : data.to_mark) +gt_ggc_mx (v); } /* General tree->tree mapping structure for use in hash tables. */ Jakub
Re: [patch, fortran] PR117430 gfortran allows type(C_ptr) in I/O list
Am 12.02.25 um 21:49 schrieb Jerry D: The attached patch is fairly obvious. The use of notify_std is changed to a gfc_error. Several test cases had to be adjusted. Regression tested on x86_64. OK for trunk? This is not a review, just some random comments on the testsuite changes by your patch: diff --git a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 index 4c2a7d657ee..92bfca4363d 100644 --- a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 +++ b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 @@ -1,5 +1,4 @@ ! { dg-do compile } -! { dg-options "" } ! ! PR fortran/56378 ! PR fortran/52426 @@ -24,5 +23,5 @@ contains end module use iso_c_binding -print *, c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have either the POINTER or the TARGET attribute" } +i = c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have either the POINTER or the TARGET attribute" } ^^^ i is not declared a type(c_ptr) end diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 index 4ce1c6809e4..834570cb74d 100644 --- a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 @@ -1,5 +1,4 @@ ! { dg-do run } -! { dg-options "-std=gnu" } ! This test case exists because gfortran had an error in converting the ! expressions for the derived types from iso_c_binding in some cases. module c_ptr_tests_10 @@ -7,7 +6,7 @@ module c_ptr_tests_10 contains subroutine sub0() bind(c) -print *, 'c_null_ptr is: ', c_null_ptr +print *, 'c_null_ptr is: ', transfer (cptr, C_LONG_LONG) This does not do what one naively might think. transfer (cptr, C_LONG_LONG) == transfer (cptr, 0) You probably want: transfer (cptr, 0_C_INTPTR_T) end subroutine sub0 end module c_ptr_tests_10 diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 index 5a32553b8c5..711b9c157d4 100644 --- a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 @@ -16,9 +16,9 @@ contains type(myF90Derived), pointer :: my_f90_type_ptr my_f90_type%my_c_ptr = c_null_ptr -print *, 'my_f90_type is: ', my_f90_type%my_c_ptr +print *, 'my_f90_type is: ', transfer(my_f90_type%my_c_ptr, C_LONG_LONG) my_f90_type_ptr => my_f90_type -print *, 'my_f90_type_ptr is: ', my_f90_type_ptr%my_c_ptr +print *, 'my_f90_type_ptr is: ', transfer(my_f90_type_ptr%my_c_ptr, C_LONG_LONG) end subroutine sub0 end module c_ptr_tests_9 Likewise. diff --git a/gcc/testsuite/gfortran.dg/init_flag_17.f90 b/gcc/testsuite/gfortran.dg/init_flag_17.f90 index 401830fccbc..8bb9f7b1ef7 100644 --- a/gcc/testsuite/gfortran.dg/init_flag_17.f90 +++ b/gcc/testsuite/gfortran.dg/init_flag_17.f90 @@ -19,8 +19,8 @@ program init_flag_17 type(ty) :: t - print *, t%ptr - print *, t%fptr + print *, transfer(t%ptr, c_long_long) + print *, transfer(t%fptr, c_long_long) end program Likewise. diff --git a/gcc/testsuite/gfortran.dg/pr32601_1.f03 b/gcc/testsuite/gfortran.dg/pr32601_1.f03 index a297e1728ec..1a48419112d 100644 --- a/gcc/testsuite/gfortran.dg/pr32601_1.f03 +++ b/gcc/testsuite/gfortran.dg/pr32601_1.f03 @@ -4,9 +4,9 @@ ! PR fortran/32601 use, intrinsic :: iso_c_binding, only: c_loc, c_ptr implicit none - +integer i ! This was causing an ICE, but is an error because the argument to C_LOC ! needs to be a variable. -print *, c_loc(4) ! { dg-error "shall have either the POINTER or the TARGET attribute" } +i = c_loc(4) ! { dg-error "shall have either the POINTER or the TARGET attribute" } end Again, i should be declared as type(c_ptr). Cheers, Harald Regards, Jerry Author: Jerry DeLisle Date: Tue Feb 11 20:57:50 2025 -0800 Fortran: gfortran allows type(C_ptr) in I/O list Before this patch, gfortran was accepting invalid use of type(c_ptr) in I/O statements. The fix affects several existing test cases so no new test case needed. Existing tests were modified to pass by either using the transfer function to convert to an acceptable value or using an assignment to a like type (non-I/O). PR fortran/117430 gcc/fortran/ChangeLog: * resolve.cc (resolve_transfer): Issue the error with no exceptions allowed. gcc/testsuite/ChangeLog: * gfortran.dg/c_loc_test_17.f90: Modify to pass. * gfortran.dg/c_ptr_tests_10.f03: Likewise. * gfortran.dg/c_ptr_tests_16.f90: Likewise. * gfortran.dg/c_ptr_tests_9.f03: Likewise. * gfortran.dg/init_flag_17.f90: Likewise. * gfortran.dg/pr32601_1.f03: Likewise.
[PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label
Increment LABEL_NUSES when using minipool_vector_label to avoid the zero use count on minipool_vector_label. PR target/118866 * config/arm/arm.cc (arm_reorg): Increment LABEL_NUSES when using minipool_vector_label. -- H.J. From 91907dc6d948bf256dfa95a161af783df44b1b65 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 14 Feb 2025 05:25:47 +0800 Subject: [PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label Increment LABEL_NUSES when using minipool_vector_label to avoid the zero use count on minipool_vector_label. PR target/118866 * config/arm/arm.cc (arm_reorg): Increment LABEL_NUSES when using minipool_vector_label. Signed-off-by: H.J. Lu --- gcc/config/arm/arm.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index a95ddf8201f..2e3ffdd2607 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -19787,6 +19787,7 @@ arm_reorg (void) gen_rtx_LABEL_REF (VOIDmode, minipool_vector_label), this_fix->minipool->offset); + LABEL_NUSES (minipool_vector_label) += 1; *this_fix->loc = gen_rtx_MEM (this_fix->mode, addr); } -- 2.48.1
[pushed] c++: omp declare variant tweak
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- In r15-6707 I changed this function to use build_stub_object to more simply produce the right type, but it occurs to me that forward_parm would be even better, specifically for the diagnostic. This changes nothing with respect to PR118791. gcc/cp/ChangeLog: * decl.cc (omp_declare_variant_finalize_one): Use forward_parm. gcc/testsuite/ChangeLog: * g++.dg/gomp/declare-variant-3.C: Adjust diagnostic. * g++.dg/gomp/declare-variant-5.C: Adjust diagnostic. --- gcc/cp/decl.cc| 2 +- gcc/testsuite/g++.dg/gomp/declare-variant-3.C | 8 gcc/testsuite/g++.dg/gomp/declare-variant-5.C | 8 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 7f7f4938f2c..df4e66798b1 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -8462,7 +8462,7 @@ omp_declare_variant_finalize_one (tree decl, tree attr) if (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE) parm = DECL_CHAIN (parm); for (; parm; parm = DECL_CHAIN (parm)) -vec_safe_push (args, build_stub_object (TREE_TYPE (parm))); +vec_safe_push (args, forward_parm (parm)); unsigned nappend_args = 0; tree append_args_list = TREE_CHAIN (TREE_CHAIN (chain)); diff --git a/gcc/testsuite/g++.dg/gomp/declare-variant-3.C b/gcc/testsuite/g++.dg/gomp/declare-variant-3.C index 8c0cfd218ad..fdf030fc429 100644 --- a/gcc/testsuite/g++.dg/gomp/declare-variant-3.C +++ b/gcc/testsuite/g++.dg/gomp/declare-variant-3.C @@ -86,8 +86,8 @@ struct E { int e; }; void fn19 (E, int); -#pragma omp declare variant (fn19)match(user={condition(0)}) // { dg-error {could not convert 'std::declval\(\)' from 'int' to 'E'} } -void fn20 (int, E); +#pragma omp declare variant (fn19)match(user={condition(0)}) // { dg-error {could not convert 'i' from 'int' to 'E'} } +void fn20 (int i, E e); struct F { operator int () const { return 42; } int f; }; void fn21 (int, F); @@ -95,8 +95,8 @@ void fn21 (int, F); #pragma omp declare variant ( fn21 ) match (user = { condition ( 1 - 1 ) } ) // { dg-error "variant 'void fn21\\\(int, F\\\)' and base 'void fn22\\\(F, F\\\)' have incompatible types" } void fn22 (F, F); -#pragma omp declare variant (fn19) match (user={condition(0)}) // { dg-error {could not convert 'std::declval\(\)' from 'F' to 'E'} } -void fn23 (F, int); +#pragma omp declare variant (fn19) match (user={condition(0)}) // { dg-error {could not convert 'f' from 'F' to 'E'} } +void fn23 (F f, int i); void fn24 (int); struct U { int u; }; diff --git a/gcc/testsuite/g++.dg/gomp/declare-variant-5.C b/gcc/testsuite/g++.dg/gomp/declare-variant-5.C index a4747ac030b..f3697f66aba 100644 --- a/gcc/testsuite/g++.dg/gomp/declare-variant-5.C +++ b/gcc/testsuite/g++.dg/gomp/declare-variant-5.C @@ -74,8 +74,8 @@ struct E { int e; }; void fn19 (E, int) {} -#pragma omp declare variant (fn19)match(user={condition(0)}) // { dg-error {could not convert 'std::declval\(\)' from 'int' to 'E'} } -void fn20 (int, E) {} +#pragma omp declare variant (fn19)match(user={condition(0)}) // { dg-error {could not convert 'i' from 'int' to 'E'} } +void fn20 (int i, E e) {} struct F { operator int () const { return 42; } int f; }; void fn21 (int, F) {} @@ -83,8 +83,8 @@ void fn21 (int, F) {} #pragma omp declare variant ( fn21 ) match (user = { condition ( 1 - 1 ) } ) // { dg-error "variant 'void fn21\\\(int, F\\\)' and base 'void fn22\\\(F, F\\\)' have incompatible types" } void fn22 (F, F) {} -#pragma omp declare variant (fn19) match (user={condition(0)}) // { dg-error {could not convert 'std::declval\(\)' from 'F' to 'E'} } -void fn23 (F, int) {} +#pragma omp declare variant (fn19) match (user={condition(0)}) // { dg-error {could not convert 'f' from 'F' to 'E'} } +void fn23 (F f, int i) {} void fn24 (int); struct U { int u; }; base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578 -- 2.48.1
[pushed] c++: use -Wprio-ctor-dtor for attribute init_priority
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- gcc/cp/ChangeLog: * tree.cc (handle_init_priority_attribute): Use OPT_prio_ctor_dtor. gcc/testsuite/ChangeLog: * g++.dg/special/initp1.C: Test disabling -Wprio-ctor-dtor. --- gcc/cp/tree.cc| 3 ++- gcc/testsuite/g++.dg/special/initp1.C | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc index 79bc74fa2b7..bf84fb6bcec 100644 --- a/gcc/cp/tree.cc +++ b/gcc/cp/tree.cc @@ -5335,7 +5335,8 @@ handle_init_priority_attribute (tree* node, && !in_system_header_at (input_location)) { warning - (0, "requested % %i is reserved for internal use", + (OPT_Wprio_ctor_dtor, +"requested % %i is reserved for internal use", pri); } diff --git a/gcc/testsuite/g++.dg/special/initp1.C b/gcc/testsuite/g++.dg/special/initp1.C index 4a539a5a4bd..ef88ca970b8 100644 --- a/gcc/testsuite/g++.dg/special/initp1.C +++ b/gcc/testsuite/g++.dg/special/initp1.C @@ -30,9 +30,9 @@ Two hoo[ 3 ] = { Two( 15, 16 ) }; -Two coo[ 3 ] __attribute__((init_priority(1000))); - -Two koo[ 3 ] __attribute__((init_priority(1000))) = { +Two coo[ 3 ] __attribute__((init_priority(10))); // { dg-warning "reserved" } +#pragma GCC diagnostic ignored "-Wprio-ctor-dtor" +Two koo[ 3 ] __attribute__((init_priority(10))) = { Two( 21, 22 ), Two( 23, 24 ), Two( 25, 26 ) base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578 prerequisite-patch-id: cf6b02f09f22e626404250f9e5fc33e6e0351db2 -- 2.48.1
[pushed] testsuite: adjust nontype-class72 for implicit constexpr
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- This test added by r15-7507 doesn't get some expected diagnostics if we implicitly make I(E) constexpr. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-class72.C: Disable -fimplicit-constexpr. --- gcc/testsuite/g++.dg/cpp2a/nontype-class72.C | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C index 1c48ff57add..c36be7a4a80 100644 --- a/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C @@ -1,6 +1,7 @@ // PR c++/113800 // P2308R1 - Template parameter initialization // { dg-do compile { target c++20 } } +// { dg-additional-options "-fno-implicit-constexpr" } // Invalid cases. namespace std { base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578 -- 2.48.1
[PATCH] dwarf: emit DW_AT_name for DW_TAG_GNU_formal_parameter_pack [PR70536]
From: Ed Catmur Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- Per https://wiki.dwarfstd.org/C++0x_Variadic_templates.md DW_TAG_GNU_formal_parameter_pack should have a DW_AT_name: 17$: DW_TAG_formal_parameter_pack DW_AT_name("args") 18$: DW_TAG_formal_parameter ! no DW_AT_name attribute DW_AT_type(reference to 13$) (...) PR c++/70536 gcc/ChangeLog: * dwarf2out.cc (gen_formal_parameter_pack_die): Add name attr. gcc/testsuite/ChangeLog: * g++.dg/debug/dwarf2/template-func-params-7.C: Check for pack names. Co-authored-by: Jason Merrill --- gcc/dwarf2out.cc | 2 +- gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C | 7 +-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index 43884f206c0..ed7d9402200 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -23195,7 +23195,7 @@ gen_formal_parameter_pack_die (tree parm_pack, && subr_die); parm_pack_die = new_die (DW_TAG_GNU_formal_parameter_pack, subr_die, parm_pack); - add_src_coords_attributes (parm_pack_die, parm_pack); + add_name_and_src_coords_attributes (parm_pack_die, parm_pack); for (arg = pack_arg; arg; arg = DECL_CHAIN (arg)) { diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C b/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C index 22b0e4f984d..4e95c238bcd 100644 --- a/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C +++ b/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C @@ -23,6 +23,9 @@ // These 3 function template instantiations has a total of 3 template // parameters named T. // { dg-final { scan-assembler-times "\.ascii \"T.0\"\[\t \]+\[^\n\]*DW_AT_name" 3 } } +// And the packs also have names. +// { dg-final { scan-assembler-times "\.ascii \"PTs.0\"\[\t \]+\[^\n\]*DW_AT_name" 3 } } +// { dg-final { scan-assembler-times "\.ascii \"args.0\"\[\t \]+\[^\n\]*DW_AT_name" 3 } } void @@ -35,11 +38,11 @@ printf(const char* s) */ } -template +template void printf(const char* s, T value, - PackTypes... args) + PTs... args) { while (*s) { base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578 prerequisite-patch-id: cf6b02f09f22e626404250f9e5fc33e6e0351db2 prerequisite-patch-id: 29fd7472d58735638f85059fd1678bba9acf7bf6 -- 2.48.1
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On Thu, 13 Feb 2025 06:46:10 PST (-0800), jeffreya...@gmail.com wrote: On 2/13/25 1:47 AM, Robin Dapp wrote: Other thoughts? The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff we can't/don't model in the pipeline, but I have no idea how to model the VL=0 case there. Maybe so, but what Edwin is doing looks sensible enough. It wouldn't be the first time a hook got (ab)used in ways that weren't part of the original intent. I don't fully understand what's happening. So the hoisting is being done speculatively here? And it just happens to be "bad" because that might cause a VL=0 case. But are we sure a lack of speculation cannot cause such cases? Yes/No. The scheduler certainly has code to avoid hoisting when doing so would change semantics. That's not what's happening here. I'd have to put it in a debugger or read the full dumps with some crazy scheduler dump verbosity setting to be sure, but what I suspect is happening is the scheduler is processing a multi-block region (effectively an extended basic block). In this scenario the scheduler can pull insns from a later block into an earlier block, including past a conditional branch as long as it doesn't change program semantics. (Sorry to keep crossing the threads here, there's just a lot in this one and stuff gets truncated.) FWIW, that's what tripped up my "maybe there's a functional bug here" thought. It looks like the scheduling is seeing bne t0, x0, end vsetvli t1, t2, ... vsetvli x0, t2, ... ... end: vsetvli x0, t2, ... and thinking it's safe to schedule that like vsetvli t1, t2, ... bne t0, x0, end vsetvli x0, t2, ... ... end: vsetvli x0, t2, ... which I'd assumed is because the scheduler sees both execution paths overwriting the vector control registers and thus thinks it's safe to move the first vsetvli to execute speculatively. From reading "6. Configuration-Setting Instructions" in vector.md that seems intentional, though, so maybe it's all just fine? Also, why doesn't the vsetvl pass fix the situation? IMHO we need to understand the problem more thoroughly before changing things. In the end LCM minimizes the number of vsetvls and inserts them at the "earliest" point. If that is not sufficient I'd say we need modify the constraints (maybe on a per-uarch basis)? The vsevl pass is LCM based. So it's not allowed to add a vsetvl on a path that didn't have a vsetvl before. Consider this simple graph. 0 / \ 2-->3 If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl will land in bb4. bb0 is not a valid insertion point for the vsetvl pass because the path 0->3 doesn't strictly need a vsetvl. That's inherent in the LCM algorithm (anticipatable). The scheduler has no such limitations. The scheduler might create a scheduling region out of blocks 0 and 2. In that scenario, insns from block 2 may speculate into block 0 as long as doing so doesn't change semantics. Ya. The combination of the scheduler moving a vsetvli before the branch (IIUC from bb2 to bb0 here) and the vsetvli merging causes it to look like the whole vsetvli was moved before the branch. I'm not sure why the scheduler doesn't move both vsetvli instructions to execute speculatively, but otherwise this seems to be behaving as designed. It's just tripping up the VL=0 cases for us. On a separate note: How about we move the vsetvl pass after sched2? Then we could at least rely on LCM doing its work uninhibited and wouldn't reorder vsetvls afterwards. Or do we somehow rely on rtl_dce and BB reorder to run afterwards? That won't help with the problem here but might with others. It's a double edged sword. If you defer placement until after scheduling, then the vsetvls can wreck havoc with whatever schedule that sched2 came up with. It won't matter much for out of order designs, but potentially does for others. Maybe that's a broad uarch split point here? For OOO designs we'd want to rely on HW scheduling and thus avoid hoisting possibly-expensive vsetvli instructions (where they'd need to execute in HW because of the side effects), while on in-order designs we'd want to aggressively schedule vsetvli instructions because we can't rely on HW scheduling to hide the latency. In theory at sched2 time the insn stream should be fixed. There are practical/historical exceptions, but changes to the insn stream after that point are discouraged. We were just talking about this is our toolchain team meeting, and it seems like both GCC and LLVM are in similar spots here -- essentially the required set of vsetvli instructions depends very strongly on scheduling, so trying to do them independently is just always going to lead to sub-par results. It feels kind of like we want some scheduling-based cost feedback in the vsetvli pass (or the other way around if they're in the other order) to get better results. Maybe that's too much of a time sin
RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]
> -Original Message- > From: Richard Sandiford > Sent: Thursday, February 13, 2025 4:55 PM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd > > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load > [PR118464] > > Tamar Christina writes: > >> -Original Message- > >> That said, I'm quite sure we don't want to have a dr->target_alignment > >> that isn't power-of-two, so if the comput doesn't end up with a > >> power-of-two value we should leave it as the target prefers and > >> fixup (or fail) during vectorizable_load. > > > > Ack I'll round up to power of 2. > > I don't think that's enough. Rounding up 3 would give 4, but a group > size of 3 would produce vector iterations that start at 0, 3X, 6X, 9X, 12X > for some X. [3X, 6X) and [6X, 9X) both straddle a 4X alignment boundary. > Indeed, instead of rounding up I just reject the non-power of 2 alignment requests in vectorizable_load as Richi originally requested. I thought I could get it to work better by rounding up but it doesn't seem worth it. Cheers, Tamar > Thanks, > Richard
[pushed: r15-7515] jit: add "final override" to diagnostic sink [PR116613]
I added class jit_diagnostic_listener in r15-4760-g0b73e9382ab51c but forgot to annotate one of the vfuncs with "override". Fixed thusly. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-7515-g6ac313525a1fae. gcc/jit/ChangeLog: PR other/116613 * dummy-frontend.cc (jit_diagnostic_listener::on_report_diagnostic): Add "final override". Signed-off-by: David Malcolm --- gcc/jit/dummy-frontend.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc index 1d0080d6fec..88784ec9e92 100644 --- a/gcc/jit/dummy-frontend.cc +++ b/gcc/jit/dummy-frontend.cc @@ -1017,7 +1017,7 @@ public: } void on_report_diagnostic (const diagnostic_info &info, -diagnostic_t orig_diag_kind) +diagnostic_t orig_diag_kind) final override { JIT_LOG_SCOPE (gcc::jit::active_playback_ctxt->get_logger ()); -- 2.26.3
Patch ping^6 (Re: [PATCH] analyzer: Handle nonnull_if_nonzero attribute [PR117023])
On Thu, Feb 06, 2025 at 04:30:47PM +0100, Jakub Jelinek wrote: > On Tue, Jan 21, 2025 at 04:59:16PM +0100, Jakub Jelinek wrote: > > On Tue, Jan 07, 2025 at 01:49:04PM +0100, Jakub Jelinek wrote: > > > On Wed, Dec 18, 2024 at 12:15:15PM +0100, Jakub Jelinek wrote: > > > > On Fri, Dec 06, 2024 at 05:07:40PM +0100, Jakub Jelinek wrote: > > > > > I'd like to ping the > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668699.html > > > > > patch. > > > > > > > > > > The patches it depended on are already committed and there is a patch > > > > > which depends on this (the builtins shift from nonnull to > > > > > nonnull_if_nonzero > > > > > where needed) which has been approved but can't be committed. > > > > > > > > Gentle ping on this one. > > > > > > Ping. > > > > Ping again. > > Ping. Ping again. > > > > > > 2024-11-14 Jakub Jelinek > > > > > > > > > > > > PR c/117023 > > > > > > gcc/analyzer/ > > > > > > * sm-malloc.cc (malloc_state_machine::on_stmt): Handle > > > > > > also nonnull_if_nonzero attributes. > > > > > > gcc/testsuite/ > > > > > > * c-c++-common/analyzer/call-summaries-malloc.c > > > > > > (test_use_without_check): Pass 4 rather than sz to memset. > > > > > > * c-c++-common/analyzer/strncpy-1.c (test_null_dst, > > > > > > test_null_src): Pass 42 rather than count to strncpy. Jakub
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 11:13 AM, Palmer Dabbelt wrote: FWIW, that's what tripped up my "maybe there's a functional bug here" thought. It looks like the scheduling is seeing bne t0, x0, end vsetvli t1, t2, ... vsetvli x0, t2, ... ... end: vsetvli x0, t2, ... and thinking it's safe to schedule that like vsetvli t1, t2, ... bne t0, x0, end vsetvli x0, t2, ... ... end: vsetvli x0, t2, ... which I'd assumed is because the scheduler sees both execution paths overwriting the vector control registers and thus thinks it's safe to move the first vsetvli to execute speculatively. From reading "6. Configuration-Setting Instructions" in vector.md that seems intentional, though, so maybe it's all just fine? I think it's fine. Perhaps not what we want from a performance standpoint, but functionally safe. Also, why doesn't the vsetvl pass fix the situation? IMHO we need to understand the problem more thoroughly before changing things. In the end LCM minimizes the number of vsetvls and inserts them at the "earliest" point. If that is not sufficient I'd say we need modify the constraints (maybe on a per-uarch basis)? The vsevl pass is LCM based. So it's not allowed to add a vsetvl on a path that didn't have a vsetvl before. Consider this simple graph. 0 / \ 2-->3 If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl will land in bb4. bb0 is not a valid insertion point for the vsetvl pass because the path 0->3 doesn't strictly need a vsetvl. That's inherent in the LCM algorithm (anticipatable). The scheduler has no such limitations. The scheduler might create a scheduling region out of blocks 0 and 2. In that scenario, insns from block 2 may speculate into block 0 as long as doing so doesn't change semantics. Ya. The combination of the scheduler moving a vsetvli before the branch (IIUC from bb2 to bb0 here) and the vsetvli merging causes it to look like the whole vsetvli was moved before the branch. I'm not sure why the scheduler doesn't move both vsetvli instructions to execute speculatively, but otherwise this seems to be behaving as designed. It's just tripping up the VL=0 cases for us. You'd have to get into those dumps and possibly throw the compiler under a debugger. My guess is it didn't see any advantage in doing so. Maybe that's a broad uarch split point here? For OOO designs we'd want to rely on HW scheduling and thus avoid hoisting possibly-expensive vsetvli instructions (where they'd need to execute in HW because of the side effects), while on in-order designs we'd want to aggressively schedule vsetvli instructions because we can't rely on HW scheduling to hide the latency. There may be. But the natural question would be cost/benefit. It may not buy us anything on the performance side to defer vsetvl insertion for OOO cores. At which point the only advantage is testsuite stability. And if that's the only benefit, we may be able to do that through other mechanisms. In theory at sched2 time the insn stream should be fixed. There are practical/historical exceptions, but changes to the insn stream after that point are discouraged. We were just talking about this is our toolchain team meeting, and it seems like both GCC and LLVM are in similar spots here -- essentially the required set of vsetvli instructions depends very strongly on scheduling, so trying to do them independently is just always going to lead to sub-par results. It feels kind of like we want some scheduling- based cost feedback in the vsetvli pass (or the other way around if they're in the other order) to get better results. Maybe that's too much of a time sink for the OOO machines, though? If we've got HW scheduling then the SW just has to be in the ballpark and everything should be fine. I'd guess it more work than it'd be worth. We're just not seeing vsetvls being all that problematical on our design. I do see a lot of seemingly gratutious changes in the vector config, but when we make changes to fix that we generally end up with worse performing code. Jeff
[PATCH] c++: fix propagating REF_PARENTHESIZED_P [PR116379]
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- Here we have: template struct X{ T val; decltype(auto) value(){ return (val); } }; where the return type of value should be 'int &' since '(val)' is an expression, not a name, and decltype(auto) performs the type deduction using the decltype rules. The problem is that we weren't propagating REF_PARENTHESIZED_P correctly: the return value of finish_non_static_data_member in this test was a REFERENCE_REF_P, so we didn't set the flag. We should use force_paren_expr like below. PR c++/116379 gcc/cp/ChangeLog: * pt.cc (tsubst_expr) : Use force_paren_expr to set REF_PARENTHESIZED_P. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/decltype-auto9.C: New test. --- gcc/cp/pt.cc| 4 ++-- gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C | 15 +++ 2 files changed, 17 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index a2fc8813e9d..5706a3987c3 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -21712,8 +21712,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) { r = finish_non_static_data_member (member, object, NULL_TREE, complain); - if (TREE_CODE (r) == COMPONENT_REF) - REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t); + if (REF_PARENTHESIZED_P (t)) + force_paren_expr (r); RETURN (r); } else if (type_dependent_expression_p (object)) diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C new file mode 100644 index 000..1ccf95a0170 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C @@ -0,0 +1,15 @@ +// PR c++/116379 +// { dg-do compile { target c++14 } } + +template +struct X { + T val; + decltype(auto) value() { return (val); } +}; + +int main() { + int i = 0; + X x{ static_cast(i) }; + using type = decltype(x.value()); + using type = int&; +} base-commit: a134dcd8a010744a0097d190f73a4efc2e381531 -- 2.48.1
[PATCH V3] RISC-V: Prevent speculative vsetvl insn scheduling
The instruction scheduler appears to be speculatively hoisting vsetvl insns outside of their basic block without checking for data dependencies. This resulted in a situation where the following occurs vsetvli a5,a1,e32,m1,tu,ma vle32.v v2,0(a0) sub a1,a1,a5 <-- a1 potentially set to 0 sh2add a0,a5,a0 vfmacc.vv v1,v2,v2 vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0 beq a1,zero,.L12 <-- check if avl is 0 This patch would essentially delay the vsetvl update to after the branch to prevent unnecessarily updating the vinfo at the end of a basic block. Since this is purely a performance related patch, gate the target hook with an opt flag to see the fallout. PR 117974 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_sched_can_speculate_insn): (TARGET_SCHED_CAN_SPECULATE_INSN): Implement. * config/riscv/riscv.opt: Add temporary opt. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr117974.c: New test. Signed-off-by: Edwin Lu --- V2: add testcase V3: add opt flag to test performance --- gcc/config/riscv/riscv.cc | 25 +++ gcc/config/riscv/riscv.opt| 4 +++ .../gcc.target/riscv/rvv/vsetvl/pr117974.c| 17 + 3 files changed, 46 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 6e14126e3a4..7203594b526 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -10209,6 +10209,28 @@ riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn *insn, int cost, return new_cost; } +/* Implement TARGET_SCHED_CAN_SPECULATE_INSN hook. Return true if insn can + can be scheduled for speculative execution. Reject vsetvl instructions to + prevent the scheduler from hoisting them out of basic blocks without + checking for data dependencies PR117974. */ +static bool +riscv_sched_can_speculate_insn (rtx_insn *insn) +{ + /* Gate speculative scheduling of vsetvl instructions behind opt flag + for performance testing purposes. */ + if (!vsetvl_speculative_sched) +return true; + + switch (get_attr_type (insn)) +{ + case TYPE_VSETVL: + case TYPE_VSETVL_PRE: + return false; + default: + return true; +} +} + /* Auxiliary function to emit RISC-V ELF attribute. */ static void riscv_emit_attribute () @@ -14055,6 +14077,9 @@ bool need_shadow_stack_push_pop_p () #undef TARGET_SCHED_ADJUST_COST #define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost +#undef TARGET_SCHED_CAN_SPECULATE_INSN +#define TARGET_SCHED_CAN_SPECULATE_INSN riscv_sched_can_speculate_insn + #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 7515c8ea13d..486ba746d99 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -681,3 +681,7 @@ Specifies whether the fence.tso instruction should be used. mautovec-segment Target Integer Var(riscv_mautovec_segment) Init(1) Enable (default) or disable generation of vector segment load/store instructions. + +-param=vsetvl-speculative-sched +Target Undocumented Uinteger Var(vsetvl_speculative_sched) Init(0) +-param=vsetvl-speculative-sched Enable speculative scheduling of vsetvl instructions. diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c new file mode 100644 index 000..97839427987 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -mrvv-vector-bits=zvl -Ofast" } */ +/* { dg-additional-options "--param=vsetvl-speculative-sched" } */ + +float g(float q[], int N){ +float dqnorm = 0.0; + +#pragma GCC unroll 4 + +for (int i=0; i < N; i++) { +dqnorm = dqnorm + q[i] * q[i]; +} +return dqnorm; +} + +/* { dg-final { scan-assembler-times {beq\s+[a-x0-9]+,zero,.L12\s+vsetvli} 3 } } */ + -- 2.43.0
Re: [patch, Fortran] Fix PR 118845
Hi Jerry, This is OK. Pushed as r15-7509. Thanks for the review! It would be good to get confirmation that the lapack builds now. I use to be set up here to do that, but dont have it at the moment. It checked the original test case, that passed. But yes, a Lapack tester would be nice. Now on to PR118862 (but not this today :-) Best regards Thomas
[PATCH v2 1/1] gdc: define ELFv1 and ELFv2 versions for powerpc
From: Zixing Liu gcc/ChangeLog: * config/rs6000/rs6000-d.cc: define ELFv1 and ELFv2 version identifiers according to the target options. gcc/testsuite/ChangeLog: * gdc.dg/ppcabi.d: Add a test to test for code generation correctness when using IEEE 128 and new ELFv1 and ELFv2 identifiers. Signed-off-by: Zixing Liu --- gcc/config/rs6000/rs6000-d.cc | 5 + gcc/testsuite/gdc.dg/ppcabi.d | 23 +++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gdc.dg/ppcabi.d diff --git a/gcc/config/rs6000/rs6000-d.cc b/gcc/config/rs6000/rs6000-d.cc index c9e1acad88..bc5d643d49 100644 --- a/gcc/config/rs6000/rs6000-d.cc +++ b/gcc/config/rs6000/rs6000-d.cc @@ -45,6 +45,11 @@ rs6000_d_target_versions (void) d_add_builtin_version ("PPC_SoftFloat"); d_add_builtin_version ("D_SoftFloat"); } + + if (DEFAULT_ABI == ABI_ELFv2) +d_add_builtin_version ("ELFv2"); + else +d_add_builtin_version ("ELFv1"); } /* Handle a call to `__traits(getTargetInfo, "floatAbi")'. */ diff --git a/gcc/testsuite/gdc.dg/ppcabi.d b/gcc/testsuite/gdc.dg/ppcabi.d new file mode 100644 index 00..9271c64436 --- /dev/null +++ b/gcc/testsuite/gdc.dg/ppcabi.d @@ -0,0 +1,23 @@ +// { dg-do compile { target { powerpc64*-linux-gnu* } } } +// { dg-options "-mabi=ieeelongdouble -mabi=elfv2 -mcpu=power9 -O2" } + +// { dg-final { scan-assembler "_Z13test_functionu9__ieee128" } } +extern (C++) bool test_function(real arg) { +// { dg-final { scan-assembler "xscmpuqp" } } +// { dg-final { scan-assembler-not "fcmpu" } } +return arg > 0.0; +} + +// { dg-final { scan-assembler "test_version" } } +extern (C) bool test_version() { +// { dg-final { scan-assembler "li 3,1" } } +version (PPC64) return real.mant_dig == 113; +else return false; +} + +// { dg-final { scan-assembler "test_elf_version" } } +extern (C) bool test_elf_version() { +// { dg-final { scan-assembler "li 3,0" } } +version (ELFv2) return false; +else return true; +} -- 2.48.1
[PATCH v2 0/1] gdc: define ELFv1 and ELFv2 versions for powerpc
From: Zixing Liu This patch was formerly known as "gdc: define ELFv1, ELFv2 and D_PPCUseIEEE128 versions for powerpc", due to new developments in https://github.com/dlang/dmd/pull/20826, compiler is now not required to mark D_PPCUseIEEE128 version identifier. Instead, correctly setting real.mant_dig will suffice (GDC is already providing the correct information). The patch adds the ELFv1 and ELFv2 version identifiers to bridge the gap between LLVM D Compiler (LDC) and GNU D Compiler (GDC) so that the user can reliably use the "version(...)" syntax to check for which ABI is currently in use. ELFv1 and ELFv2 ABI concept seem to only exist on POWER platforms, so other platforms do not need to follow this change, as far as I know. Zixing Liu (1): gdc: define ELFv1 and ELFv2 versions for powerpc gcc/config/rs6000/rs6000-d.cc | 5 + gcc/testsuite/gdc.dg/ppcabi.d | 23 +++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gdc.dg/ppcabi.d -- 2.48.1
Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines
On 2/10/25 2:25 AM, Andre Vehreschild wrote: [PATCH 7/7] Fortran: Remove deprecated coarray routines [PR107635] I have applied all patches. Regression tested OK here. From patch 5 there was one reject: patching file gcc/testsuite/gfortran.dg/coarray/send_char_array_1.f90 Hunk #1 FAILED at 39. 1 out of 1 hunk FAILED -- saving rejects to file gcc/testsuite/gfortran.dg/coarray/send_char_array_1.f90.rej I commented earlier about changing the name of rewrite.cc. I am now going through the whole enchilada for editorial stuff. Regards, Jerry
[patch, Fortran] Fix PR 118845
Hello world, this was an interesting regression. It came from my recent patch, where an assert was triggered because a procedure artificial dummy argument generated for a global symbol did not have the information if if was a function or a subroutine. Fixed by adding the information in gfc_get_formal_from_actual_arglist. This information then uncovered some new errors, also in the testsuite, which needed fixing. Finally, the error is made to look a bit nicer, so the user gets a pointer to where the original interface comes from, like this: 10 | CALL bar (test2) ! { dg-error "Interface mismatch in dummy procedure" } | 1 .. 16 | CALL bar (test) ! { dg-error "Interface mismatch in dummy procedure" } | 2 Fehler: Interface mismatch in dummy procedure at (1) conflichts with (2): 'test2' is not a subroutine Regression-tested. OK for trunk? Best regards Thomas gcc/fortran/ChangeLog: PR fortran/118845 * interface.cc (compare_parameter): If the formal attribute has been generated from an actual argument list, also output an pointer to there in case of an error. (gfc_get_formal_from_actual_arglist): Set function and subroutine attributes and (if it is a function) the typespec from the actual argument. gcc/testsuite/ChangeLog: PR fortran/118845 * gfortran.dg/recursive_check_4.f03: Adjust call so types matche. * gfortran.dg/recursive_check_6.f03: Likewise. * gfortran.dg/specifics_2.f90: Adjust calls so types match. * gfortran.dg/interface_52.f90: New test. * gfortran.dg/interface_53.f90: New test. diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc index fdde84db80d..edec907d33a 100644 --- a/gcc/fortran/interface.cc +++ b/gcc/fortran/interface.cc @@ -2474,8 +2474,16 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual, sizeof(err),NULL, NULL)) { if (where) - gfc_error_opt (0, "Interface mismatch in dummy procedure %qs at %L:" - " %s", formal->name, &actual->where, err); + { + /* Artificially generated symbol names would only confuse. */ + if (formal->attr.artificial) + gfc_error_opt (0, "Interface mismatch in dummy procedure " + "at %L conflicts with %L: %s", &actual->where, + &formal->declared_at, err); + else + gfc_error_opt (0, "Interface mismatch in dummy procedure %qs " + "at %L: %s", formal->name, &actual->where, err); + } return false; } @@ -2483,8 +2491,16 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual, sizeof(err), NULL, NULL)) { if (where) - gfc_error_opt (0, "Interface mismatch in dummy procedure %qs at %L:" - " %s", formal->name, &actual->where, err); + { + if (formal->attr.artificial) + gfc_error_opt (0, "Interface mismatch in dummy procedure " + "at %L conflichts with %L: %s", &actual->where, + &formal->declared_at, err); + else + gfc_error_opt (0, "Interface mismatch in dummy procedure %qs at " + "%L: %s", formal->name, &actual->where, err); + + } return false; } @@ -5822,7 +5838,14 @@ gfc_get_formal_from_actual_arglist (gfc_symbol *sym, gfc_get_symbol (name, gfc_current_ns, &s); if (a->expr->ts.type == BT_PROCEDURE) { + gfc_symbol *asym = a->expr->symtree->n.sym; s->attr.flavor = FL_PROCEDURE; + if (asym->attr.function) + { + s->attr.function = 1; + s->ts = asym->ts; + } + s->attr.subroutine = asym->attr.subroutine; } else { diff --git a/gcc/testsuite/gfortran.dg/interface_52.f90 b/gcc/testsuite/gfortran.dg/interface_52.f90 new file mode 100644 index 000..4d619241c27 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/interface_52.f90 @@ -0,0 +1,20 @@ + ! { dg-do compile } +MODULE m + IMPLICIT NONE + +CONTAINS + + SUBROUTINE test () +IMPLICIT NONE + +CALL bar (test2) ! { dg-error "Interface mismatch in dummy procedure" } + END SUBROUTINE test + + INTEGER FUNCTION test2 () RESULT (x) +IMPLICIT NONE + +CALL bar (test) ! { dg-error "Interface mismatch in dummy procedure" } + END FUNCTION test2 + +END MODULE m + diff --git a/gcc/testsuite/gfortran.dg/interface_53.f90 b/gcc/testsuite/gfortran.dg/interface_53.f90 new file mode 100644 index 000..99a2b959463 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/interface_53.f90 @@ -0,0 +1,8 @@ +! { dg-do compile } +! PR 118845 - reduced from a segfault in Lapack. +SUBROUTINE SDRVES( RESULT ) + external SSLECT + CALL SGEES( SSLECT ) + CALL SGEES( SSLECT ) + RESULT = SSLECT( 1, 2 ) +END diff --git a/gcc/testsuite/gfortran.dg/recursive_check_4.f03 b/gcc/testsuite/gfortran.dg/recursive_check_4.f03 index ece42ca2312..da45762f9b1 100644 --- a/gcc/testsuite/gfortran.dg/recursive_check_4.f03 +++ b/gcc/testsuite/gfortran.dg/recursive_check_4.f03 @@ -20,7 +20,7 @@ CONTAINS IMPLICIT NONE PRO
[pushed] c++: -frange-for-ext-temps and reused temps [PR118856]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- Some things in the front-end use a TARGET_EXPR to create a temporary, then refer to its TARGET_EXPR_SLOT separately later; in this testcase, maybe_init_list_as_range does. So we need to handle that pattern in extend_all_temps. PR c++/118856 gcc/cp/ChangeLog: * call.cc (struct extend_temps_data): Add var_map. (extend_all_temps): Adjust. (set_up_extended_ref_temp): Make walk_data void*. (extend_temps_r): Remap variables. Handle pset here. Extend all TARGET_EXPRs. gcc/testsuite/ChangeLog: * g++.dg/cpp23/range-for9.C: New test. --- gcc/cp/call.cc | 89 + gcc/testsuite/g++.dg/cpp23/range-for9.C | 20 ++ 2 files changed, 68 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp23/range-for9.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 2c77b4a4b68..38a8f7fdcda 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -14154,18 +14154,6 @@ make_temporary_var_for_ref_to_temp (tree decl, tree type) return pushdecl (var); } -/* Data for extend_temps_r, mostly matching the parameters of - extend_ref_init_temps. */ - -struct extend_temps_data -{ - tree decl; - tree init; - vec **cleanups; - tree* cond_guard; - hash_set *pset; -}; - static tree extend_temps_r (tree *, int *, void *); /* EXPR is the initializer for a variable DECL of reference or @@ -14177,7 +14165,7 @@ static tree extend_temps_r (tree *, int *, void *); static tree set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups, tree *initp, tree *cond_guard, - extend_temps_data *walk_data) + void *walk_data) { tree init; tree type; @@ -14218,7 +14206,7 @@ set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups, maybe_constant_init because the extension might change its result. */ if (walk_data) cp_walk_tree (&TARGET_EXPR_INITIAL (expr), extend_temps_r, - walk_data, walk_data->pset); + walk_data, nullptr); else TARGET_EXPR_INITIAL (expr) = extend_ref_init_temps (decl, TARGET_EXPR_INITIAL (expr), cleanups, @@ -14833,6 +14821,19 @@ extend_ref_init_temps_1 (tree decl, tree init, vec **cleanups, return init; } +/* Data for extend_temps_r, mostly matching the parameters of + extend_ref_init_temps. */ + +struct extend_temps_data +{ + tree decl; + tree init; + vec **cleanups; + tree* cond_guard; + hash_set *pset; // For avoiding redundant walk_tree. + hash_map *var_map; // For remapping extended temps. +}; + /* Tree walk function for extend_all_temps. Generally parallel to extend_ref_init_temps_1, but adapted for walk_tree. */ @@ -14841,7 +14842,15 @@ extend_temps_r (tree *tp, int *walk_subtrees, void *data) { extend_temps_data *d = (extend_temps_data *)data; - if (TYPE_P (*tp) || TREE_CODE (*tp) == CLEANUP_POINT_EXPR) + if (TREE_CODE (*tp) == VAR_DECL) +{ + if (tree *r = d->var_map->get (*tp)) + *tp = *r; + return NULL_TREE; +} + + if (TYPE_P (*tp) || TREE_CODE (*tp) == CLEANUP_POINT_EXPR + || d->pset->add (*tp)) { *walk_subtrees = 0; return NULL_TREE; @@ -14849,13 +14858,13 @@ extend_temps_r (tree *tp, int *walk_subtrees, void *data) if (TREE_CODE (*tp) == COND_EXPR) { - cp_walk_tree (&TREE_OPERAND (*tp, 0), extend_temps_r, d, d->pset); + cp_walk_tree (&TREE_OPERAND (*tp, 0), extend_temps_r, d, nullptr); auto walk_arm = [d](tree &op) { tree cur_cond_guard = NULL_TREE; auto ov = make_temp_override (d->cond_guard, &cur_cond_guard); - cp_walk_tree (&op, extend_temps_r, d, d->pset); + cp_walk_tree (&op, extend_temps_r, d, nullptr); if (cur_cond_guard) { tree set = build2 (MODIFY_EXPR, boolean_type_node, @@ -14870,29 +14879,25 @@ extend_temps_r (tree *tp, int *walk_subtrees, void *data) return NULL_TREE; } - if (TREE_CODE (*tp) == ADDR_EXPR - /* A discarded-value temporary. */ - || (TREE_CODE (*tp) == CONVERT_EXPR - && VOID_TYPE_P (TREE_TYPE (*tp -{ - tree *p; - for (p = &TREE_OPERAND (*tp, 0); - TREE_CODE (*p) == COMPONENT_REF || TREE_CODE (*p) == ARRAY_REF; ) - p = &TREE_OPERAND (*p, 0); - if (TREE_CODE (*p) == TARGET_EXPR) - { - tree subinit = NULL_TREE; - *p = set_up_extended_ref_temp (d->decl, *p, d->cleanups, &subinit, -d->cond_guard, d); - if (TREE_CODE (*tp) == ADDR_EXPR) - recompute_tree_invariant_for_addr_expr (*tp); - if (subinit) - *tp = cp_build_compound_expr (subinit, *tp, tf_none); - } -} + tree *p = tp; - /* TARGET_EXPRs that aren't handled by the above are implementation details - that shouldn't be ref-exten
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/13/25 8:19 AM, Robin Dapp wrote: The vsevl pass is LCM based. So it's not allowed to add a vsetvl on a path that didn't have a vsetvl before. Consider this simple graph. 0 / \ 2-->3 If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl will land in bb4. bb0 is not a valid insertion point for the vsetvl pass because the path 0->3 doesn't strictly need a vsetvl. That's inherent in the LCM algorithm (anticipatable). Yeah, I remember the same issue with the rounding-mode setter placement. Yes. For VXRM placement, under the right circumstances we pretend there is a need for the VXRM state at the first instruction in the first BB. That enables very aggressive hoisting by LCM in those limited cases. Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3 (or any other block that doesn't require one)? Such a dummy vsetvl would be fusible with every other vsetvl. If there are dummy vsetvls remaining after LCM just delete them? Just thinking out loud, the devil will be in the details. But in Vineet's case they want to avoid speculation as that can result in a vl=0 case. If we had a dummy fusible vsetvl in bb3, then that would allow movement into bb0 which is undesirable. WRT a question Palmer asked earlier in the thread. I went back and reviewed the code/docs around the hook Edwin is using. My reading is a bit different and that what Edwin is doing is perfectly fine. Jeff
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
Vladimir Makarov writes: > On 2/7/25 12:18 PM, Richard Sandiford wrote: >> FWIW, here's a very rough initial version of the kind of thing >> I was thinking about. Hopefully the hook documentation describes >> the approach. It's deliberately (overly?) flexible. >> >> I've included an aarch64 version that (a) models the fact that the >> first caller-save can also allocate the frame more-or-less for free, >> and (b) once we've saved an odd number of GPRs, saving one more is >> essentialy free. I also hacked up an x86 version locally to model >> the allocation benefits of using caller-saved registers. It seemed >> to fix the povray example above. >> >> This still needs a lot of clean-up and testing, but I thought I might >> as well send what I have before leaving for the weekend. Does it look >> reasonable in principle? >> > Richard, thank you for continuing work on this problem. These hooks and > their implementation have much more sense to me. Although it is > difficult to predict that it will solve all existing related PRs. You > definitely get my approval of your hooks if you will manage not to have > new GCC testsuite failures with these hooks on x86-64, aarch64, and ppc64. Thanks Vlad! Here's an updated patch that passes testing aarch64-linux-gnu and x86_64-linux-gnu. I haven't yet checked ppc64, but will do that. Just wanted to post what I have before going off on a long weekend. As described below, the patch also shows no change to AArch64 SPEC2017 scores. I'm afraid I'll need help from x86 folks to do performance testing there. Richard >From 46ad583e65a1c5a27e2203a7571bba6eb0766bc6 Mon Sep 17 00:00:00 2001 From: Richard Sandiford Date: Fri, 7 Feb 2025 15:40:21 + Subject: [PATCH] ira: Add new hooks for callee-save vs spills [PR117477] To: gcc-patches@gcc.gnu.org Following on from the discussion in: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and replaces it with two hooks: one that controls the cost of using an extra callee-saved register and one that controls the cost of allocating a frame for the first spill. (The patch does not attempt to address the shrink-wrapping part of the thread above.) On AArch64, this is enough to fix PR117477, as verified by the new tests. The patch does not change the SPEC2017 scores. (An earlier version did regress perlbench, because the aarch64 hook in that version incorrectly treated call-preserved registers as having the same cost as call-clobbered registers, even for pseudos that are not live across a call. Oops.) The x86 change follows Honza's suggestion of deducting 2 from the current cost, to model the saving of using push & pop. With the new hooks, we could instead increase the cost of using a caller-saved register (i.e. model the extra add and sub), but I haven't tried that. I did however check that deducting 1 instead of 2 was enough to make pr91384.c pass for -mabi=32 but not for -mabi=64. gcc/ PR rtl-optimization/117477 * config/aarch64/aarch64.cc (aarch64_count_saves): New function. (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost) (aarch64_frame_allocation_cost): Likewise. (TARGET_CALLEE_SAVE_COST): Define. (TARGET_FRAME_ALLOCATION_COST): Likewise. * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): Replace with... (ix86_callee_save_cost): ...this new hook. (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. (TARGET_CALLEE_SAVE_COST): Define. * target.h (spill_cost_type, frame_cost_type): New enums. * target.def (callee_save_cost, frame_allocation_cost): New hooks. (ira_callee_saved_register_cost_scale): Delete. * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks. * doc/tm.texi: Regenerate. * hard-reg-set.h (hard_reg_set_popcount): New function. * ira-color.cc (allocated_memory_p): New variable. (allocated_callee_save_regs): Likewise. (record_allocation): New function. (assign_hard_reg): Use targetm.frame_allocation_cost to model the cost of the first spill or first caller save. Use targetm.callee_save_cost to model the cost of using new callee-saved registers. Apply the exit rather than entry frequency to the cost of restoring a register or deallocating the frame. Update the new variables above. (improve_allocation): Use record_allocation. (color): Initialize allocated_callee_save_regs. (ira_color): Initialize allocated_memory_p. * targhooks.h (default_callee_save_cost): Declare. (default_frame_allocation_cost): Likewise. * targhooks.cc (default_callee_save_cost): New function. (default_frame_allocation_cost): Likewise. gcc/testsuite/ PR rtl-
Re: [patch, Fortran] Fix PR 118845
On 2/13/25 11:59 AM, Thomas Koenig wrote: Hello world, this was an interesting regression. It came from my recent patch, where an assert was triggered because a procedure artificial dummy argument generated for a global symbol did not have the information if if was a function or a subroutine. Fixed by adding the information in gfc_get_formal_from_actual_arglist. This information then uncovered some new errors, also in the testsuite, which needed fixing. Finally, the error is made to look a bit nicer, so the user gets a pointer to where the original interface comes from, like this: 10 | CALL bar (test2) ! { dg-error "Interface mismatch in dummy procedure" } | 1 .. 16 | CALL bar (test) ! { dg-error "Interface mismatch in dummy procedure" } | 2 Fehler: Interface mismatch in dummy procedure at (1) conflichts with (2): 'test2' is not a subroutine Regression-tested. OK for trunk? This is OK. It would be good to get confirmation that the lapack builds now. I use to be set up here to do that, but dont have it at the moment. Thanks for the quick fix. Jerry Best regards Thomas gcc/fortran/ChangeLog: PR fortran/118845 * interface.cc (compare_parameter): If the formal attribute has been generated from an actual argument list, also output an pointer to there in case of an error. (gfc_get_formal_from_actual_arglist): Set function and subroutine attributes and (if it is a function) the typespec from the actual argument. gcc/testsuite/ChangeLog: PR fortran/118845 * gfortran.dg/recursive_check_4.f03: Adjust call so types matche. * gfortran.dg/recursive_check_6.f03: Likewise. * gfortran.dg/specifics_2.f90: Adjust calls so types match. * gfortran.dg/interface_52.f90: New test. * gfortran.dg/interface_53.f90: New test.
Re: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]
Tamar Christina writes: >> -Original Message- >> That said, I'm quite sure we don't want to have a dr->target_alignment >> that isn't power-of-two, so if the comput doesn't end up with a >> power-of-two value we should leave it as the target prefers and >> fixup (or fail) during vectorizable_load. > > Ack I'll round up to power of 2. I don't think that's enough. Rounding up 3 would give 4, but a group size of 3 would produce vector iterations that start at 0, 3X, 6X, 9X, 12X for some X. [3X, 6X) and [6X, 9X) both straddle a 4X alignment boundary. Thanks, Richard
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
> I did try adding some additional logic to adjust the way vsetvl fusion > occurs across basic blocks in these scenarios i.e. performing the > fusion in the opposite manner (breaking lcm guarantees); however, from > my testing, fusing two vsetvls didn't actually remove the fused > expression from the vinfo list. I'm not sure if that's intended but as a > result, phase 3 would remove the fused block and use the vinfo that > should've been fused into the other. It depends on the specific example but keeping deleted vsetvls/infos around has a purpose because it helps delete other vsetvls still. I don't recall details but I remember having at least a few examples for it. -- Regards Robin
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On Thu, 13 Feb 2025 07:38:13 PST (-0800), jeffreya...@gmail.com wrote: On 2/13/25 8:19 AM, Robin Dapp wrote: The vsevl pass is LCM based. So it's not allowed to add a vsetvl on a path that didn't have a vsetvl before. Consider this simple graph. 0 / \ 2-->3 If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl will land in bb4. bb0 is not a valid insertion point for the vsetvl pass because the path 0->3 doesn't strictly need a vsetvl. That's inherent in the LCM algorithm (anticipatable). Yeah, I remember the same issue with the rounding-mode setter placement. Yes. For VXRM placement, under the right circumstances we pretend there is a need for the VXRM state at the first instruction in the first BB. That enables very aggressive hoisting by LCM in those limited cases. Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3 (or any other block that doesn't require one)? Such a dummy vsetvl would be fusible with every other vsetvl. If there are dummy vsetvls remaining after LCM just delete them? Just thinking out loud, the devil will be in the details. But in Vineet's case they want to avoid speculation as that can result in a vl=0 case. If we had a dummy fusible vsetvl in bb3, then that would allow movement into bb0 which is undesirable. Ya, I think we confused everyone because there's really two vsetvli/branch movement things we've been talking about and they're kind of the opposite. There's the issue this patch works around, where we found some vsetvli instances that set VL=0 in unrolled loops. That makes some of our hardware people upset. Turns out the reduced test case has the branches to early-out of the unrolled loop when VL would be 0, so just banning vsetvli speculation fixes the issue. It's kind of a indirect way to solve a uarch-specific problem, so who knows if it'll be worth doing. Then there's the vsetvli loop-invarint hoisting / vector tail generation thing we were talking about in the meeting this week. Having the vsetvli in the loop made a different subset of our hardware people upset. That's kind of the opposite optimization, though we'd want to avoid the VL=0 case. They're both "Vineet's bug", the hardware people tend to call Vineet when they get upset ;) WRT a question Palmer asked earlier in the thread. I went back and reviewed the code/docs around the hook Edwin is using. My reading is a bit different and that what Edwin is doing is perfectly fine. Awesome, thanks. So I think if this is sane enough to run experiments we can at least try that out and see what happens. Jeff
Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]
Ping. On Thu, Feb 06, 2025 at 11:26:48AM -0500, Marek Polacek wrote: > Ping. > > On Tue, Jan 21, 2025 at 11:05:46AM -0500, Marek Polacek wrote: > > Ping. > > > > On Fri, Jan 10, 2025 at 03:07:52PM -0500, Marek Polacek wrote: > > > Ping. > > > > > > On Fri, Dec 20, 2024 at 08:58:05AM -0500, Marek Polacek wrote: > > > > Ping. > > > > > > > > On Tue, Nov 26, 2024 at 05:35:50PM -0500, Marek Polacek wrote: > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > > > > > > > -- >8 -- > > > > > As the manual states, using "-fhardened -fstack-protector" will > > > > > produce > > > > > a warning because -fhardened wants to enable -fstack-protector-strong, > > > > > but it can't since it's been overriden by the weaker > > > > > -fstack-protector. > > > > > > > > > > -fhardened also attempts to enable -Wl,-z,relro,-z,now. By the same > > > > > logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should > > > > > produce the same warning. But we don't detect this combination, so > > > > > this patch fixes it. I also renamed a variable to better reflect its > > > > > purpose. > > > > > > > > > > Also don't check warn_hardened in process_command, since it's always > > > > > true there. > > > > > > > > > > Also tweak wording in the manual as Jon Wakely suggested on IRC. > > > > > > > > > > PR driver/117739 > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * doc/invoke.texi: Tweak wording for -Whardened. > > > > > * gcc.cc (driver_handle_option): If -z lazy or -z norelro was > > > > > specified, don't enable linker hardening. > > > > > (process_command): Don't check warn_hardened. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * c-c++-common/fhardened-16.c: New test. > > > > > * c-c++-common/fhardened-17.c: New test. > > > > > * c-c++-common/fhardened-18.c: New test. > > > > > * c-c++-common/fhardened-19.c: New test. > > > > > * c-c++-common/fhardened-20.c: New test. > > > > > * c-c++-common/fhardened-21.c: New test. > > > > > --- > > > > > gcc/doc/invoke.texi | 4 ++-- > > > > > gcc/gcc.cc| 20 ++-- > > > > > gcc/testsuite/c-c++-common/fhardened-16.c | 5 + > > > > > gcc/testsuite/c-c++-common/fhardened-17.c | 5 + > > > > > gcc/testsuite/c-c++-common/fhardened-18.c | 5 + > > > > > gcc/testsuite/c-c++-common/fhardened-19.c | 5 + > > > > > gcc/testsuite/c-c++-common/fhardened-20.c | 5 + > > > > > gcc/testsuite/c-c++-common/fhardened-21.c | 5 + > > > > > 8 files changed, 46 insertions(+), 8 deletions(-) > > > > > create mode 100644 gcc/testsuite/c-c++-common/fhardened-16.c > > > > > create mode 100644 gcc/testsuite/c-c++-common/fhardened-17.c > > > > > create mode 100644 gcc/testsuite/c-c++-common/fhardened-18.c > > > > > create mode 100644 gcc/testsuite/c-c++-common/fhardened-19.c > > > > > create mode 100644 gcc/testsuite/c-c++-common/fhardened-20.c > > > > > create mode 100644 gcc/testsuite/c-c++-common/fhardened-21.c > > > > > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > > > index 346ac1369b8..371f723539c 100644 > > > > > --- a/gcc/doc/invoke.texi > > > > > +++ b/gcc/doc/invoke.texi > > > > > @@ -7012,8 +7012,8 @@ This warning is enabled by @option{-Wall}. > > > > > Warn when @option{-fhardened} did not enable an option from its set > > > > > (for > > > > > which see @option{-fhardened}). For instance, using > > > > > @option{-fhardened} > > > > > and @option{-fstack-protector} at the same time on the command line > > > > > causes > > > > > -@option{-Whardened} to warn because > > > > > @option{-fstack-protector-strong} is > > > > > -not enabled by @option{-fhardened}. > > > > > +@option{-Whardened} to warn because > > > > > @option{-fstack-protector-strong} will > > > > > +not be enabled by @option{-fhardened}. > > > > > > > > > > This warning is enabled by default and has effect only when > > > > > @option{-fhardened} > > > > > is enabled. > > > > > diff --git a/gcc/gcc.cc b/gcc/gcc.cc > > > > > index 92c92996401..d2718d263bb 100644 > > > > > --- a/gcc/gcc.cc > > > > > +++ b/gcc/gcc.cc > > > > > @@ -305,9 +305,10 @@ static size_t dumpdir_length = 0; > > > > > driver added to dumpdir after dumpbase or linker output name. */ > > > > > static bool dumpdir_trailing_dash_added = false; > > > > > > > > > > -/* True if -r, -shared, -pie, or -no-pie were specified on the > > > > > command > > > > > - line. */ > > > > > -static bool any_link_options_p; > > > > > +/* True if -r, -shared, -pie, -no-pie, -z lazy, or -z norelro were > > > > > + specified on the command line, and therefore -fhardened should not > > > > > + add -z now/relro. */ > > > > > +static bool avoid_linker_hardening_p; > > > > > > > > > > /* True if -static was specified on the command line. */ > > > > > static bool static_p; > > > > > @@ -4434,10
Re: [PATCH] RISC-V: Avoid more unsplit insns in const expander [PR118832].
On 2/12/25 7:03 AM, Robin Dapp wrote: Hi, in PR118832 we have another instance of the problem already noticed in PR117878. We sometimes use e.g. expand_simple_binop for vector operations like shift or and. While this is usually OK, it causes problems when doing it late, e.g. during LRA. In particular, we might rematerialize a const_vector during LRA, which then leaves an insn laying around that cannot be split any more if it requires a pseudo. Therefore we should only use the split variants in expand_const_vector. This patch fixed the issue in the PR and also pre-emptively rewrites two other spots that might be prone to the same issue. Regtested on rv64gcv_zvl512b. As the two other cases don't have a test (so might not even trigger) I unconditionally enabled them for my testsuite run. Regards Robin PR target/118832 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Expand as vlmax insn during lra. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr118832.c: New test. Pushed to the trunk and I'll update the BZ entry momentarily. jeff
Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW
On 2/12/25 2:22 PM, Jakub Jelinek wrote: I agree that the most common cases should be all the arguments the same type. I was working under the assumption that the args would be compatible types already, forgetting that IFNs are different in that regard than other gimple ops. I wouldn't want to go any further than all three operands the same with the easy to reason about relation checks. For gcc-16 I think we can extend that block fairly easily to handle certain mismatched size cases and we look to see if there's cases where the combination of a relationship between the arguments and some range information would allow us to capture further cases. For the GCC 16 version, I think best would be (given Andrew's mail that the relations aren't likely very useful for incompatible types) to relation_kind rel = VREL_VARYING; if (code == MINUS_EXPR && types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1)) { rel = query->relation().query (s, op0, op1); /* The result of the infinite precision subtraction of the same values will be always 0. That will fit into any result type. */ if (rel == VREL_EQ) return true; } then do the current int_range_max vr0, vr1; if (!query->range_of_expr (vr0, op0, s) || vr0.undefined_p ()) vr0.set_varying (TREE_TYPE (op0)); if (!query->range_of_expr (vr1, op1, s) || vr1.undefined_p ()) vr1.set_varying (TREE_TYPE (op1)); tree vr0min = wide_int_to_tree (TREE_TYPE (op0), vr0.lower_bound ()); tree vr0max = wide_int_to_tree (TREE_TYPE (op0), vr0.upper_bound ()); tree vr1min = wide_int_to_tree (TREE_TYPE (op1), vr1.lower_bound ()); tree vr1max = wide_int_to_tree (TREE_TYPE (op1), vr1.upper_bound ()); and then we can e.g. special case > and >=: /* If op1 is not negative, op0 - op1 for op0 >= op1 will be always in [0, op0] and so if vr0max - vr1min fits into type, there won't be any overflow. */ if ((rel == VREL_GT || rel == VREL_GE) && tree_int_cst_sgn (vr1min) >= 0 && !arith_overflowed_p (MINUS_EXPR, type, vr0max, vr1min)) return true; Would need to think about if anything could be simplified for VREL_G{T,E} if tree_int_cst_sgn (vr1min) < 0. As for VREL_LT, one would need to think it through as well for both tree_int_cst_sgn (vr1min) >= 0 and tree_int_cst_sgn (vr1min) < 0. For the former, the infinite precision of subtraction is known given the relation to be < 0. Now obviously if TYPE_UNSIGNED (type) that would imply always overflow. But for !TYPE_UNSIGNED (type) that isn't necessarily the case and the question is if the relation helps with the reasoning. Generally the code otherwise tries to check 2 boundaries (for MULT_EXPR 4 but we don't care about that), if they both don't overflow, it is ok, if only one overflows, we don't know, if both boundaries don't overflow, we need to look further and check some corner cases in between. Or just go with that even for GCC 15 (completely untested and dunno if something needs to be done about s = NULL passed to query or not) for now, with the advantage that it can do something even for the cases where type is not compatible with types of arguments, and perhaps add additional cases later? This is further than I wanted to go for gcc-15. But I can support something like this as it's not a major extension to what I was suggesting. And of course it addresses the correctness issues around different types. I'll play with it a bit. And WRT an earlier message about gcc-16. Yea, I think opening a bug for additional cases would be a good idea. Jeff
Re: [PATCH] c++: fix propagating REF_PARENTHESIZED_P [PR116379]
On 2/13/25 11:37 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? OK. -- >8 -- Here we have: template struct X{ T val; decltype(auto) value(){ return (val); } }; where the return type of value should be 'int &' since '(val)' is an expression, not a name, and decltype(auto) performs the type deduction using the decltype rules. The problem is that we weren't propagating REF_PARENTHESIZED_P correctly: the return value of finish_non_static_data_member in this test was a REFERENCE_REF_P, so we didn't set the flag. We should use force_paren_expr like below. PR c++/116379 gcc/cp/ChangeLog: * pt.cc (tsubst_expr) : Use force_paren_expr to set REF_PARENTHESIZED_P. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/decltype-auto9.C: New test. --- gcc/cp/pt.cc| 4 ++-- gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C | 15 +++ 2 files changed, 17 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index a2fc8813e9d..5706a3987c3 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -21712,8 +21712,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) { r = finish_non_static_data_member (member, object, NULL_TREE, complain); - if (TREE_CODE (r) == COMPONENT_REF) - REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t); + if (REF_PARENTHESIZED_P (t)) + force_paren_expr (r); RETURN (r); } else if (type_dependent_expression_p (object)) diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C new file mode 100644 index 000..1ccf95a0170 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C @@ -0,0 +1,15 @@ +// PR c++/116379 +// { dg-do compile { target c++14 } } + +template +struct X { + T val; + decltype(auto) value() { return (val); } +}; + +int main() { + int i = 0; + X x{ static_cast(i) }; + using type = decltype(x.value()); + using type = int&; +} base-commit: a134dcd8a010744a0097d190f73a4efc2e381531
[PATCH v3] x86: Properly find the maximum stack slot alignment
On Thu, Feb 13, 2025 at 5:17 PM Uros Bizjak wrote: > > On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu wrote: > > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > registers. Then check memory accesses by such registers, including > > stack and frame registers. > > > > gcc/ > > > > PR target/109780 > > PR target/109093 > > * config/i386/i386.cc (ix86_update_stack_alignment): New. > > (ix86_find_all_reg_use_1): Likewise. > > (ix86_find_all_reg_use): Likewise. > > (ix86_find_max_used_stack_alignment): Also check memory accesses > > from registers defined by stack or frame registers. > > > > gcc/testsuite/ > > > > PR target/109780 > > PR target/109093 > > * g++.target/i386/pr109780-1.C: New test. > > * gcc.target/i386/pr109093-1.c: Likewise. > > * gcc.target/i386/pr109780-1.c: Likewise. > > * gcc.target/i386/pr109780-2.c: Likewise. > > * gcc.target/i386/pr109780-3.c: Likewise. > > Some non-algorithmical changes below, otherwise LGTM. Please also get > someone to review dataflow infrastructure usage, I am not well versed > with it. > > +/* Helper function for ix86_find_all_reg_use. */ > + > +static void > +ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access, > + auto_bitmap &worklist) > +{ > + rtx src = SET_SRC (set); > + if (MEM_P (src)) > > Also reject assignment from CONST_SCALAR_INT? Done. > +return; > + > + rtx dest = SET_DEST (set); > + if (!REG_P (dest)) > +return; > > Can we switch these two so the test for REG_P (dest) will be first? We > are not interested in anything that doesn't assign to a register. Done. > +/* Find all registers defined with REG. */ > + > +static void > +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access, > + unsigned int reg, auto_bitmap &worklist) > +{ > + for (df_ref ref = DF_REG_USE_CHAIN (reg); > + ref != NULL; > + ref = DF_REF_NEXT_REG (ref)) > +{ > + if (DF_REF_IS_ARTIFICIAL (ref)) > +continue; > + > + rtx_insn *insn = DF_REF_INSN (ref); > + if (!NONDEBUG_INSN_P (insn)) > +continue; > > Here we pass only NONJUMP_INSN_P (X) || JUMP_P (X) || CALL_P (X) > > + if (CALL_P (insn) || JUMP_P (insn)) > +continue; > > And here remains only NONJUMP_INSN_P (X), so both above conditions > could be substituted with: > > if (!NONJUMP_INSN_P (X)) > continue; Done. > + > + rtx set = single_set (insn); > + if (set) > +ix86_find_all_reg_use_1 (set, stack_slot_access, worklist); > + > + rtx pat = PATTERN (insn); > + if (GET_CODE (pat) != PARALLEL) > +continue; > + > + for (int i = 0; i < XVECLEN (pat, 0); i++) > +{ > + rtx exp = XVECEXP (pat, 0, i); > + switch (GET_CODE (exp)) > +{ > +case ASM_OPERANDS: > +case CLOBBER: > +case PREFETCH: > +case USE: > + break; > +case UNSPEC: > +case UNSPEC_VOLATILE: > + for (int j = XVECLEN (exp, 0) - 1; j >= 0; j--) > +{ > + rtx x = XVECEXP (exp, 0, j); > + if (GET_CODE (x) == SET) > +ix86_find_all_reg_use_1 (x, stack_slot_access, > + worklist); > +} > + break; > +case SET: > + ix86_find_all_reg_use_1 (exp, stack_slot_access, > + worklist); > + break; > +default: > + debug_rtx (exp); > > Stray debug remaining? Removed. > + HARD_REG_SET stack_slot_access; > + CLEAR_HARD_REG_SET (stack_slot_access); > + > + /* Stack slot can be accessed by stack pointer, frame pointer or > + registers defined by stack pointer or frame pointer. */ > + auto_bitmap worklist; > > Please put a line of vertical space here ... Done. > + add_to_hard_reg_set (&stack_slot_access, Pmode, > + STACK_POINTER_REGNUM); > + bitmap_set_bit (worklist, STACK_POINTER_REGNUM); > > ... here ... Done. > + if (frame_pointer_needed) > +{ > + add_to_hard_reg_set (&stack_slot_access, Pmode, > + HARD_FRAME_POINTER_REGNUM); > + bitmap_set_bit (worklist, HARD_FRAME_POINTER_REGNUM); > +} > > ... here ... > Done. > + unsigned int reg; > > ... here ... Done. > + do > +{ > + reg = bitmap_clear_first_set_bit (worklist); > + ix86_find_all_reg_use (stack_slot_access, reg, worklist); > +} > + while (!bitmap_empty_p (worklist)); > + > + hard_reg_set_iterator hrsi; > > ... here ... Done. > + EXECUTE_IF_SET_IN_HARD_REG_SET (stack_slot_access, 0, reg, hrsi) > +for (df_ref ref = DF_REG_USE_CHAIN (reg); > + ref != NULL; > + ref = DF_REF_NEXT_REG (ref)) > + { > +if (DF_REF_IS_ARTIFICIAL (ref)) > + continue; > + > +rtx_insn *insn = DF_REF_INSN (ref); > > ... and here. > Done. > +if (!NONDEBUG_INSN_P (insn)) > > !NONJUMP_INSN_P ? Changed. > + continue; > > Also some vertical space here. Done. > +note_stores (insn, ix86_
[PATCH] libstc++: Improve list assumption after constructor [PR118865]
The code example here does: ``` if (begin == end) __builtin_unreachable(); std::list nl(begin, end); for (auto it = nl.begin(); it != nl.end(); it++) { ... } /* Remove the first element of the list. */ nl.erase(nl.begin()); ``` And we get a warning because because we jump threaded the case were we think the list was empty from the for loop BUT we populated it without an empty array. So can help the compiler here by adding that after initializing the list with non empty array, that the list will not be empty either. This is able to remove the -Wfree-nonheap-object warning in the first reduced testcase (with the fix for `begin == end` case added) in the PR 118865; the second reduced testcase has been filed off as PR 118867. Bootstrapped and tested on x86_64-linux-gnu. libstdc++-v3/ChangeLog: PR libstdc++/118865 * include/bits/stl_list.h (_M_initialize_dispatch): Add an unreachable if the iterator was not empty that the list will now be not empty. Signed-off-by: Andrew Pinski --- libstdc++-v3/include/bits/stl_list.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h index be33eeb03d4..f987d8b9d0a 100644 --- a/libstdc++-v3/include/bits/stl_list.h +++ b/libstdc++-v3/include/bits/stl_list.h @@ -2384,12 +2384,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _M_initialize_dispatch(_InputIterator __first, _InputIterator __last, __false_type) { + bool __notempty = __first != __last; for (; __first != __last; ++__first) #if __cplusplus >= 201103L emplace_back(*__first); #else push_back(*__first); #endif +if (__notempty) + { +if (begin() == end()) + __builtin_unreachable(); + } } // Called by list(n,v,a), and the range constructor when it turns out -- 2.43.0
Re: [patch, fortran] PR117430 gfortran allows type(C_ptr) in I/O list
On 2/13/25 1:42 PM, Harald Anlauf wrote: Am 12.02.25 um 21:49 schrieb Jerry D: The attached patch is fairly obvious. The use of notify_std is changed to a gfc_error. Several test cases had to be adjusted. Regression tested on x86_64. OK for trunk? This is not a review, just some random comments on the testsuite changes by your patch: I will update and give the i some declarations. I just tried integer and it worked. Of course you are correct to declasre these as type(c_ptr) Regarding, the use of transfer, I will fix those as well. The patch itself is trivial so I will wait a day or so for any other comments. Thanks for feedback. Jerry diff --git a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 index 4c2a7d657ee..92bfca4363d 100644 --- a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 +++ b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 @@ -1,5 +1,4 @@ ! { dg-do compile } -! { dg-options "" } ! ! PR fortran/56378 ! PR fortran/52426 @@ -24,5 +23,5 @@ contains end module use iso_c_binding -print *, c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have either the POINTER or the TARGET attribute" } +i = c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have either the POINTER or the TARGET attribute" } ^^^ i is not declared a type(c_ptr) end diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 index 4ce1c6809e4..834570cb74d 100644 --- a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 @@ -1,5 +1,4 @@ ! { dg-do run } -! { dg-options "-std=gnu" } ! This test case exists because gfortran had an error in converting the ! expressions for the derived types from iso_c_binding in some cases. module c_ptr_tests_10 @@ -7,7 +6,7 @@ module c_ptr_tests_10 contains subroutine sub0() bind(c) - print *, 'c_null_ptr is: ', c_null_ptr + print *, 'c_null_ptr is: ', transfer (cptr, C_LONG_LONG) This does not do what one naively might think. transfer (cptr, C_LONG_LONG) == transfer (cptr, 0) You probably want: transfer (cptr, 0_C_INTPTR_T) end subroutine sub0 end module c_ptr_tests_10 diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 index 5a32553b8c5..711b9c157d4 100644 --- a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 +++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 @@ -16,9 +16,9 @@ contains type(myF90Derived), pointer :: my_f90_type_ptr my_f90_type%my_c_ptr = c_null_ptr - print *, 'my_f90_type is: ', my_f90_type%my_c_ptr + print *, 'my_f90_type is: ', transfer(my_f90_type%my_c_ptr, C_LONG_LONG) my_f90_type_ptr => my_f90_type - print *, 'my_f90_type_ptr is: ', my_f90_type_ptr%my_c_ptr + print *, 'my_f90_type_ptr is: ', transfer(my_f90_type_ptr%my_c_ptr, C_LONG_LONG) end subroutine sub0 end module c_ptr_tests_9 Likewise. diff --git a/gcc/testsuite/gfortran.dg/init_flag_17.f90 b/gcc/testsuite/gfortran.dg/init_flag_17.f90 index 401830fccbc..8bb9f7b1ef7 100644 --- a/gcc/testsuite/gfortran.dg/init_flag_17.f90 +++ b/gcc/testsuite/gfortran.dg/init_flag_17.f90 @@ -19,8 +19,8 @@ program init_flag_17 type(ty) :: t - print *, t%ptr - print *, t%fptr + print *, transfer(t%ptr, c_long_long) + print *, transfer(t%fptr, c_long_long) end program Likewise. diff --git a/gcc/testsuite/gfortran.dg/pr32601_1.f03 b/gcc/testsuite/gfortran.dg/pr32601_1.f03 index a297e1728ec..1a48419112d 100644 --- a/gcc/testsuite/gfortran.dg/pr32601_1.f03 +++ b/gcc/testsuite/gfortran.dg/pr32601_1.f03 @@ -4,9 +4,9 @@ ! PR fortran/32601 use, intrinsic :: iso_c_binding, only: c_loc, c_ptr implicit none - +integer i ! This was causing an ICE, but is an error because the argument to C_LOC ! needs to be a variable. -print *, c_loc(4) ! { dg-error "shall have either the POINTER or the TARGET attribute" } +i = c_loc(4) ! { dg-error "shall have either the POINTER or the TARGET attribute" } end Again, i should be declared as type(c_ptr). Cheers, Harald Regards, Jerry Author: Jerry DeLisle Date: Tue Feb 11 20:57:50 2025 -0800 Fortran: gfortran allows type(C_ptr) in I/O list Before this patch, gfortran was accepting invalid use of type(c_ptr) in I/O statements. The fix affects several existing test cases so no new test case needed. Existing tests were modified to pass by either using the transfer function to convert to an acceptable value or using an assignment to a like type (non-I/O). PR fortran/117430 gcc/fortran/ChangeLog: * resolve.cc (resolve_transfer): Issue the error with no exceptions allowed. gcc/testsuite/ChangeLog: * gfortran.dg/c_loc_test_17.f90: Modify to pass. * gfortran.dg/c_ptr_tests_10.f03: Likewise. * gfortran.dg/c_ptr_tes
Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
On 2/14/25 04:58, Jeff Law wrote: > I'd guess it more work than it'd be worth. We're just not seeing > vsetvls being all that problematical on our design. I do see a lot of > seemingly gratutious changes in the vector config, but when we make > changes to fix that we generally end up with worse performing code. To be clear the VSETVL on their own are not problematic for us either. It causing VL=0 is. I have a change in works which could introduce additional VSETVLs ;-) -Vineet
Re: [PATCH] tree-optimization/90579 - avoid STLF fail by better optimizing
On 2/12/25 7:58 AM, Richard Biener wrote: For the testcase in question which uses a fold-left vectorized reduction of a reverse iterating loop we'd need two forwprop invocations to first bypass the permute emitted for the reverse iterating loop and then to decompose the vector load that only feeds element extracts. The following moves the first transform to a match.pd pattern and makes sure we fold the element extracts when the vectorizer emits them so the single forwprop pass can then pick up the vector load decomposition, avoiding the forwarding fail that causes. Moving simplify_bitfield_ref also makes forwprop remove the dead VEC_PERM_EXPR via the simple-dce it uses - this was also previously missing. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. PR tree-optimization/90579 * tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to match.pd. (pass_forwprop::execute): Adjust. * match.pd (bit_field_ref (vec_perm ...)): New pattern modeled after simplify_bitfield_ref. * tree-vect-loop.cc (vect_expand_fold_left): Fold the element extract stmt, combining it with the vector def. OK. Jeff
[PATCH] i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]
Hi all, When AVX512 is not explicitly set, we should not take EVEX512 bit into consideration when checking vector size. It will solve the intrin header file reporting warnings when compiling with -Wsystem-headers. However, there is side effect on the usage for '-march=xxx -mavx10.1-256', where xxx is with AVX512. It will not report warning on vector size for now. Since it is a rare usage, we will take it. Ok for trunk and backport to GCC 14? gcc/ChangeLog: PR target/118815 * config/i386/i386-options.cc (ix86_option_override_internal): Do not check vector size conflict when AVX512 is not explicitly set. gcc/testsuite/ChangeLog: PR target/118815 * gcc.target/i386/pr118815.c: New test. --- gcc/config/i386/i386-options.cc | 1 + gcc/testsuite/gcc.target/i386/pr118815.c | 9 + 2 files changed, 10 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr118815.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 3467ab0bbeb..7e85334d3d3 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -2711,6 +2711,7 @@ ix86_option_override_internal (bool main_args_p, "using 512 as max vector size"); } else if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) + && (opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512F) && !(OPTION_MASK_ISA2_EVEX512 & opts->x_ix86_isa_flags2_explicit)) warning (0, "Vector size conflicts between AVX10.1 and AVX512, using " diff --git a/gcc/testsuite/gcc.target/i386/pr118815.c b/gcc/testsuite/gcc.target/i386/pr118815.c new file mode 100644 index 000..84308fce08a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr118815.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64-v3" } */ + +#pragma GCC push_options +#pragma GCC target("avx10.2-256") + +void foo(); + +#pragma GCC pop_options -- 2.31.1
Re: [pushed][PATCH v3 0/4] Organize the code and fix PR118828 and PR118843.
Pushed to r15-7521..r15-7524 在 2025/2/13 下午8:59, Lulu Cheng 写道: v1 -> v2: 1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE} __loongarch_{div32,am_bh,amcas,ld_seq_sa} and __loongarch_version_major/__loongarch_version_minor to update function. 2. Fixed PR118843. 3. Add testsuites. v2 -> v3: 1. Modify test cases (pr118828-3.c pr118828-4.c). Lulu Cheng (4): LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc. LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions. LoongArch: After setting the compilation options, update the predefined macros. LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843]. gcc/config/loongarch/loongarch-c.cc | 204 +- gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch-target-attr.cc | 48 - .../gcc.target/loongarch/pr118828-2.c | 30 +++ .../gcc.target/loongarch/pr118828-3.c | 32 +++ .../gcc.target/loongarch/pr118828-4.c | 32 +++ gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 +++ gcc/testsuite/gcc.target/loongarch/pr118843.c | 6 + 8 files changed, 287 insertions(+), 100 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c
Re:[pushed] [PATCH v2] LoongArch: Adjust the cost of ADDRESS_REG_REG.
Pushed to r15-7525. 在 2025/2/13 下午4:40, Lulu Cheng 写道: After changing this cost from 1 to 3, the performance of spec2006 401 473 416 465 482 can be improved by about 2% on LA664. Add option '-maddr-reg-reg-cost='. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Add option '-maddr-reg-reg-cost='. * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize addr_reg_reg_cost to 3. * config/loongarch/loongarch-opts.cc (loongarch_target_option_override): If '-maddr-reg-reg-cost=' is not used, set it to the initial value. * config/loongarch/loongarch-tune.h (struct loongarch_rtx_cost_data): Add the member addr_reg_reg_cost and its assignment function to the structure loongarch_rtx_cost_data. * config/loongarch/loongarch.cc (loongarch_address_insns): Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.opt.urls: Regenerate. * doc/invoke.texi: Add description of '-maddr-reg-reg-cost='. gcc/testsuite/ChangeLog: * gcc.target/loongarch/const-double-zero-stx.c: Add '-maddr-reg-reg-cost=1'. * gcc.target/loongarch/stack-check-alloca-1.c: Likewise. Change-Id: I8fbf7a6d073b16c7829b1a9a8d239b131d53ab1b --- gcc/config/loongarch/genopts/loongarch.opt.in | 4 gcc/config/loongarch/loongarch-def.cc | 1 + gcc/config/loongarch/loongarch-opts.cc | 3 +++ gcc/config/loongarch/loongarch-tune.h | 7 +++ gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.opt | 4 gcc/config/loongarch/loongarch.opt.urls| 3 +++ gcc/doc/invoke.texi| 7 ++- gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c | 2 +- gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c | 2 +- 10 files changed, 31 insertions(+), 4 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 8c292c8600d..39c1545e540 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -177,6 +177,10 @@ mbranch-cost= Target RejectNegative Joined UInteger Var(la_branch_cost) Save -mbranch-cost=COSTSet the cost of branches to roughly COST instructions. +maddr-reg-reg-cost= +Target RejectNegative Joined UInteger Var(la_addr_reg_reg_cost) Save +-maddr-reg-reg-cost=COST Set the cost of ADDRESS_REG_REG to the value calculated by COST. + mcheck-zero-division Target Mask(CHECK_ZERO_DIV) Save Trap on integer divide by zero. diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index b0271eb3b9a..5f235a04ef2 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -136,6 +136,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data () movcf2gr (COSTS_N_INSNS (7)), movgr2cf (COSTS_N_INSNS (15)), branch_cost (6), +addr_reg_reg_cost (3), memory_latency (4) {} /* The following properties cannot be looked up directly using "cpucfg". diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index 36342cc9373..c2a63f75fc2 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -1010,6 +1010,9 @@ loongarch_target_option_override (struct loongarch_target *target, if (!opts_set->x_la_branch_cost) opts->x_la_branch_cost = loongarch_cost->branch_cost; + if (!opts_set->x_la_addr_reg_reg_cost) +opts->x_la_addr_reg_reg_cost = loongarch_cost->addr_reg_reg_cost; + /* other stuff */ if (ABI_LP64_P (target->abi.base)) opts->x_flag_pcc_struct_return = 0; diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h index e69173ebf79..f7819fe7678 100644 --- a/gcc/config/loongarch/loongarch-tune.h +++ b/gcc/config/loongarch/loongarch-tune.h @@ -38,6 +38,7 @@ struct loongarch_rtx_cost_data unsigned short movcf2gr; unsigned short movgr2cf; unsigned short branch_cost; + unsigned short addr_reg_reg_cost; unsigned short memory_latency; /* Default RTX cost initializer, implemented in loongarch-def.cc. */ @@ -115,6 +116,12 @@ struct loongarch_rtx_cost_data return *this; } + loongarch_rtx_cost_data addr_reg_reg_cost_ (unsigned short _addr_reg_reg_cost) + { +addr_reg_reg_cost = _addr_reg_reg_cost; +return *this; + } + loongarch_rtx_cost_data memory_latency_ (unsigned short _memory_latency) { memory_latency = _memory_latency; diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index e9978370e8c..495b