[Bug tree-optimization/110896] [12/13/14 Regression] gcc.dg/ubsan/pr81981.c is xfailed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110896 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-08-04 --- Comment #1 from Richard Biener --- We simplify this to t[0] * 2 (by luck, so it could also be u[0] * 2) which means we lose track of the use of the other variable. Value numbering stmt = t$0_4 = PHI Setting value number of t$0_4 to t$0_4 (changed) Making available beyond BB4 t$0_4 for value t$0_4 Value numbering stmt = u$0_12 = PHI Marking CSEd to PHI node t$0_4 = PHI Setting value number of u$0_12 to t$0_4 (changed) ... Replaced redundant PHI node defining u$0_12 with t$0_4 gimple_simplified to _9 = t$0_4 * 2; early uninit sees conditional init of the memory and so refrains from diagnosing this. I suppose this is kind-of a duplicate of the many missed uninit diagnostics because of CCP optimistic propagation (and only because we don't do optimistic copyprop we do not have even more such cases).
[Bug middle-end/101955] (signed<<
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101955 --- Comment #6 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:9020da78df2854f14f8b1d38b58a6d3b77a4b731 commit r14-2977-g9020da78df2854f14f8b1d38b58a6d3b77a4b731 Author: Drew Ross Date: Fri Aug 4 09:08:05 2023 +0200 match.pd: Canonicalize (signed x << c) >> c [PR101955] Canonicalizes (signed x << c) >> c into the lowest precision(type) - c bits of x IF those bits have a mode precision or a precision of 1. Also combines this rule with (unsigned x << c) >> c -> x & ((unsigned)-1 >> c) to prevent duplicate pattern. PR middle-end/101955 * match.pd ((signed x << c) >> c): New canonicalization. * gcc.dg/pr101955.c: New test.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 Richard Biener changed: What|Removed |Added Last reconfirmed||2023-08-04 Target|riscv |riscv, x86_64-*-* CC||rsandifo at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Blocks||53947 Component|target |tree-optimization --- Comment #3 from Richard Biener --- it looks like you don't support vector short logical shift? For some reason vect_recog_over_widening_pattern doesn't check whether the demoted operation is supported ... The following helps on x86_64, it disables the demotion. I think the idea was that we eventually recognize a widening shift, so the narrow operation itself doesn't need to be supported, but clearly that doesn't work out when there is no such shift. diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index e4ab8c2d65b..4e4191652e3 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3091,6 +3091,11 @@ vect_recog_over_widening_pattern (vec_info *vinfo, if (!new_vectype || !op_vectype) return NULL; + optab optab; + if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) + || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) +return NULL; + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n", type, new_type); with the patch above x86 can vectorize both loops with AVX2 but not without. Can you confirm this helps on RISC-V as well? Richard, what was the idea here? Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #4 from JuzheZhong --- (In reply to Richard Biener from comment #3) > it looks like you don't support vector short logical shift? For some reason > vect_recog_over_widening_pattern doesn't check whether the demoted operation > is supported ... > > The following helps on x86_64, it disables the demotion. I think the idea > was that we eventually recognize a widening shift, so the narrow operation > itself doesn't need to be supported, but clearly that doesn't work out > when there is no such shift. > Thanks Richi. what is the "vector short logical shift" optab ? Could you give me the optab name? I am gonna try to support this in RISC-V port. > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index e4ab8c2d65b..4e4191652e3 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -3091,6 +3091,11 @@ vect_recog_over_widening_pattern (vec_info *vinfo, >if (!new_vectype || !op_vectype) > return NULL; > > + optab optab; > + if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) > + || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) > +return NULL; > + >if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n", > type, new_type); > > with the patch above x86 can vectorize both loops with AVX2 but not without. > > Can you confirm this helps on RISC-V as well? > > Richard, what was the idea here? Yeah. I can try it after I try "vector short logical shift" pattern.
[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874 --- Comment #15 from CVS Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:91c963ea6f845a0c59b7523a5330b8d3ed1beb6a commit r14-2978-g91c963ea6f845a0c59b7523a5330b8d3ed1beb6a Author: Andrew Pinski Date: Wed Aug 2 14:49:00 2023 -0700 Fix PR 110874: infinite loop in gimple_bitwise_inverted_equal_p with fre This changes gimple_bitwise_inverted_equal_p to use a 2 different match patterns to try to match bit_not wrapped with a possible nop_convert and a comparison also wrapped with a possible nop_convert. This is to avoid being recursive. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/110874 * gimple-match-head.cc (gimple_bit_not_with_nop): New declaration. (gimple_maybe_cmp): Likewise. (gimple_bitwise_inverted_equal_p): Rewrite to use gimple_bit_not_with_nop and gimple_maybe_cmp instead of being recursive. * match.pd (bit_not_with_nop): New match pattern. (maybe_cmp): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/110874 * gcc.c-torture/compile/pr110874-a.c: New test.
[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #16 from Andrew Pinski --- Fixed.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #5 from JuzheZhong --- (In reply to Richard Biener from comment #3) > it looks like you don't support vector short logical shift? For some reason > vect_recog_over_widening_pattern doesn't check whether the demoted operation > is supported ... > > The following helps on x86_64, it disables the demotion. I think the idea > was that we eventually recognize a widening shift, so the narrow operation > itself doesn't need to be supported, but clearly that doesn't work out > when there is no such shift. > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index e4ab8c2d65b..4e4191652e3 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -3091,6 +3091,11 @@ vect_recog_over_widening_pattern (vec_info *vinfo, >if (!new_vectype || !op_vectype) > return NULL; > > + optab optab; > + if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) > + || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) > +return NULL; > + >if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n", > type, new_type); > > with the patch above x86 can vectorize both loops with AVX2 but not without. > > Can you confirm this helps on RISC-V as well? > > Richard, what was the idea here? Hi, Richi. I guess you mean "vector short logical shift" pattern is this: (define_insn_and_split "v3" [(set (match_operand:VI 0 "register_operand" "=vr,vr") (any_shift:VI (match_operand:VI 1 "register_operand" " vr,vr") (match_operand:VI 2 "vector_shift_operand" " vr,vk")))] "TARGET_VECTOR && can_create_pseudo_p ()" "#" "&& 1" [(const_int 0)] { riscv_vector::emit_vlmax_insn (code_for_pred (, mode), riscv_vector::RVV_BINOP, operands); DONE; } [(set_attr "type" "vshift") (set_attr "mode" "")]) (define_code_iterator any_shift [ashift ashiftrt lshiftrt]) VI includes vector short. I think RISCV port support vector short logical shift ?
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #9 from Rainer Orth --- Created attachment 55684 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55684&action=edit 64-bit sparc-sun-solaris2.11 cmp-mem-const-1.c.289r.combine
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #10 from Rainer Orth --- The tests still FAIL on Solaris/SPARC: FAIL: gcc.dg/cmp-mem-const-1.c scan-rtl-dump combine "narrow comparison from mode .I to QI" FAIL: gcc.dg/cmp-mem-const-2.c scan-rtl-dump combine "narrow comparison from mode .I to QI" FAIL: gcc.dg/cmp-mem-const-3.c scan-rtl-dump combine "narrow comparison from mode .I to HI" FAIL: gcc.dg/cmp-mem-const-4.c scan-rtl-dump combine "narrow comparison from mode .I to HI" FAIL: gcc.dg/cmp-mem-const-5.c scan-rtl-dump combine "narrow comparison from mode .I to SI" FAIL: gcc.dg/cmp-mem-const-6.c scan-rtl-dump combine "narrow comparison from mode .I to SI"
[Bug tree-optimization/110891] [14 Regression] Dead Code Elimination Regression since r14-2674-gd0de3bf9175
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110891 Richard Biener changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #2 from Richard Biener --- I didn't anticipate the trick triggering with FRE but we have Value numbering stmt = _6 = _5 | 64; Setting value number of _6 to _6 (changed) ... Value numbering stmt = _9 = _8 & 5; Setting value number of _9 to _9 (changed) ... Replaced a with _6 in all uses of _8 = a; Applying pattern match.pd:184, gimple-match-10.cc:6142 Applying pattern match.pd:1962, gimple-match-6.cc:16850 gimple_simplified to _10 = _5 & 5; _9 = _10; that's the old issue that when we are recursively simplifying pattern results like (bit_ior (bit_and @0 @2) (bit_and! @1 @2))) we need to push operands, but when any outer operation simplifies away we can't (or rather do not) pop them again (also when asked to never push we'd fail the pattern before trying to simplify the outer operation). That can then result in such stray copies to appear. So the first IL difference is --- a/t.c.114t.fre3 2023-08-04 09:22:55.380428835 +0200 +++ b/t.c.114t.fre3 2023-08-04 09:21:50.455470894 +0200 @@ -159,7 +159,7 @@ a = _6; _10 = _5 & 5; _9 = _10; - a = _9; + a = _10; return 0; } but that vanishes in copyprop1. ifcombine then gets different SSA names assigned which means different association of bitwise or operations. For some reason this causes the divergence in DOM2. After copyprop2 we have -FREE_SSANAMES: 12, 21, 3, 4, 7, 16, 15, 19, 22, 26, 17, 27, 13, 6, 20, 25, 8, 18, 24, 9, +FREE_SSANAMES: 12, 21, 3, 4, 7, 16, 15, 19, 22, 26, 17, 27, 9, 13, 6, 20, 25, 8, 18, 24, so the same SSA names are in the freelist but as that is unordered we pick different names when re-using. In the DOM2 pass you can see that ranger behaves slightly different when processing operands in different order for commutative operations like bitwise or in this case, that leads to the observed difference in threading. Tracing ranger reveals too many differences, in the end I'd say "bad luck", but maybe ranger folks want to investigate as well? I'm not convinced we need to sort FREE_SSANAMES, solving the slightly imperfect simplification for match would be nice.
[Bug tree-optimization/82397] [8 Regression] qsort comparator non-negative on sorted output: 1 in vect_analyze_data_ref_accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82397 Bug 82397 depends on bug 82446, which changed state. Bug 82446 Summary: [11/12/13/14 Regression] Missed equalities in dr_group_sort_cmp https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82446 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/82446] [11/12/13/14 Regression] Missed equalities in dr_group_sort_cmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82446 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #12 from Richard Biener --- I don't think anything changed here, but I don't see any actual testcase where we could verify things so let's close this bug.
[Bug ada/110898] New: compilation of adacl-assert-integer.ads failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110898 Bug ID: 110898 Summary: compilation of adacl-assert-integer.ads failed Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: krischik at users dot sourceforge.net CC: dkm at gcc dot gnu.org Target Milestone: --- Created attachment 55685 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55685&action=edit Source code # the exact version of GCC, as shown by "gcc -v"; > >alr exec -P1 -- gcc --version > ⓘ Synchronizing workspace... > Dependencies automatically updated as follows: > >+♼ gnat 12.1.2 (new,installed,gnat_native) > > gcc (GCC) 12.1.0 > Copyright (C) 2022 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # the system type; macOS 12.6.7 # the options when GCC was configured/built; # the exact version of GCC, as shown by "gcc -v"; > >alr exec -P1 -- gcc --version > ⓘ Synchronizing workspace... > Dependencies automatically updated as follows: > >+♼ gnat 12.1.2 (new,installed,gnat_native) > > gcc (GCC) 12.1.0 > Copyright (C) 2022 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # the system type; macOS 12.6.7 # the options when GCC was configured/built; > >alr exec -P1 -- gcc -v > Using built-in specs. > COLLECT_GCC=/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/gcc > COLLECT_LTO_WRAPPER=/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../libexec/gcc/x86_64-apple-darwin19.6.0/12.1.0/lto-wrapper > Target: x86_64-apple-darwin19.6.0 > Configured with: ../src/configure > --prefix=/Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/x86_64-darwin/gcc/install > --enable-languages=c,ada,c++ --enable-libstdcxx --enable-libstdcxx-threads > --enable-libada --disable-nls --without-libiconv-prefix > --disable-libstdcxx-pch --enable-lto --disable-multilib --disable-libcilkrts > --without-build-config > --with-build-sysroot=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk > > --with-specs='%{!sysroot=*:--sysroot=%:if-exists-else(/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk > /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk)}' > --with-mpfr=/Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/x86_64-darwin/mpfr/install > > --with-gmp=/Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/x86_64-darwin/gmp/install > > --with-mpc=/Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/x86_64-darwin/mpc/install > --build=x86_64-apple-darwin19.6.0 > Thread model: posix > Supported LTO compression algorithms: zlib > gcc version 12.1.0 (GCC) > COMPILER_PATH=/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../libexec/gcc/x86_64-apple-darwin19.6.0/12.1.0/:/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../libexec/gcc/ > LIBRARY_PATH=/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../lib/gcc/x86_64-apple-darwin19.6.0/12.1.0/:/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../lib/gcc/:/usr/local/lib/:/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../lib/gcc/x86_64-apple-darwin19.6.0/12.1.0/../../../ > COLLECT_GCC_OPTIONS='-P' '-v' '-mmacosx-version-min=12.5.0' > '-asm_macosx_version_min=12.5' '-nodefaultexport' '-mtune=core2' > '--sysroot=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk' > '-dumpdir' 'a.' > > /Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../libexec/gcc/x86_64-apple-darwin19.6.0/12.1.0/collect2 > -syslibroot > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/ > -dynamic -arch x86_64 -macosx_version_min 12.5.0 -o a.out > -L/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../lib/gcc/x86_64-apple-darwin19.6.0/12.1.0 > > -L/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../lib/gcc > -L/usr/local/lib > -L/Users/martin/.config/alire/cache/dependencies/gnat_native_12.1.2_587b912f/bin/../lib/gcc/x86_64-apple-darwin19.6.0/12.1.0/../../.. > adacl.gpr -lemutls_w -lgcc -lSystem -no_compact_unwind > ld: warning: ignoring file adacl.gpr, building for macOS-x86_64 but > attempting to link with file built for unknown-unsupported file format ( 0x2D > 0x2D 0x2D 0x2D 0x2D 0x2D 0x2D 0x2D 0x
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #11 from Stefan Schulze Frielinghaus --- Created attachment 55686 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55686&action=edit Increase optimization
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #12 from Stefan Schulze Frielinghaus --- I have done a test with a cross-compiler and it looks to me as if we need -O2 instead of -O1 on Sparc in order to trigger the optimization. Can you give the attached patch a try? Sorry for all the hassle.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #6 from JuzheZhong --- (In reply to Richard Biener from comment #3) > it looks like you don't support vector short logical shift? For some reason > vect_recog_over_widening_pattern doesn't check whether the demoted operation > is supported ... > > The following helps on x86_64, it disables the demotion. I think the idea > was that we eventually recognize a widening shift, so the narrow operation > itself doesn't need to be supported, but clearly that doesn't work out > when there is no such shift. > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index e4ab8c2d65b..4e4191652e3 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -3091,6 +3091,11 @@ vect_recog_over_widening_pattern (vec_info *vinfo, >if (!new_vectype || !op_vectype) > return NULL; > > + optab optab; > + if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) > + || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) > +return NULL; > + >if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n", > type, new_type); > > with the patch above x86 can vectorize both loops with AVX2 but not without. > > Can you confirm this helps on RISC-V as well? > > Richard, what was the idea here? Hi, Richi. I try this codes as you suggested: optab optab; if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) return NULL; [jzzhong@server1:/work/home/jzzhong/work/insn]$~/work/rvv-opensource/output/gcc-rv64/bin/riscv64-rivai-elf-gcc -march=rv64gcv -O3 --param=riscv-autovec-preference=scalable -S -fopt-info-vec-missed rvv.c rvv.c:14:1: missed: couldn't vectorize loop rvv.c:14:1: missed: not vectorized: no vectype for stmt: _4 = *_3; scalar_type: uint16_t Still can not vectorize it.
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #13 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #12 from Stefan Schulze Frielinghaus ibm.com> --- > I have done a test with a cross-compiler and it looks to me as if we need -O2 > instead of -O1 on Sparc in order to trigger the optimization. Can you give > the > attached patch a try? Sorry for all the hassle. We're getting there. The -1.c and -2.c tests PASS now, however the rest still FAILs: they lack "narrow comparison"... completely.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #7 from Richard Biener --- (In reply to JuzheZhong from comment #5) > (In reply to Richard Biener from comment #3) > > it looks like you don't support vector short logical shift? For some reason > > vect_recog_over_widening_pattern doesn't check whether the demoted operation > > is supported ... > > > > The following helps on x86_64, it disables the demotion. I think the idea > > was that we eventually recognize a widening shift, so the narrow operation > > itself doesn't need to be supported, but clearly that doesn't work out > > when there is no such shift. > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index e4ab8c2d65b..4e4191652e3 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -3091,6 +3091,11 @@ vect_recog_over_widening_pattern (vec_info *vinfo, > >if (!new_vectype || !op_vectype) > > return NULL; > > > > + optab optab; > > + if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) > > + || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) > > +return NULL; > > + > >if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n", > > type, new_type); > > > > with the patch above x86 can vectorize both loops with AVX2 but not without. > > > > Can you confirm this helps on RISC-V as well? > > > > Richard, what was the idea here? > > Hi, Richi. > > I guess you mean "vector short logical shift" pattern is this: > > (define_insn_and_split "v3" > [(set (match_operand:VI 0 "register_operand" "=vr,vr") > (any_shift:VI > (match_operand:VI 1 "register_operand" " vr,vr") > (match_operand:VI 2 "vector_shift_operand" " vr,vk")))] > "TARGET_VECTOR && can_create_pseudo_p ()" > "#" > "&& 1" > [(const_int 0)] > { > riscv_vector::emit_vlmax_insn (code_for_pred (, mode), >riscv_vector::RVV_BINOP, operands); > DONE; > } > [(set_attr "type" "vshift") > (set_attr "mode" "")]) > > (define_code_iterator any_shift [ashift ashiftrt lshiftrt]) > > VI includes vector short. > > I think RISCV port support vector short logical shift ? The optab is vlshr_optab: OPTAB_VC(vlshr_optab, "vlshr$a3", LSHIFTRT) your define_insn maybe produces the wrong names?
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #8 from Richard Biener --- (In reply to JuzheZhong from comment #6) > (In reply to Richard Biener from comment #3) > > it looks like you don't support vector short logical shift? For some reason > > vect_recog_over_widening_pattern doesn't check whether the demoted operation > > is supported ... > > > > The following helps on x86_64, it disables the demotion. I think the idea > > was that we eventually recognize a widening shift, so the narrow operation > > itself doesn't need to be supported, but clearly that doesn't work out > > when there is no such shift. > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index e4ab8c2d65b..4e4191652e3 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -3091,6 +3091,11 @@ vect_recog_over_widening_pattern (vec_info *vinfo, > >if (!new_vectype || !op_vectype) > > return NULL; > > > > + optab optab; > > + if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) > > + || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) > > +return NULL; > > + > >if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, "demoting %T to %T\n", > > type, new_type); > > > > with the patch above x86 can vectorize both loops with AVX2 but not without. > > > > Can you confirm this helps on RISC-V as well? > > > > Richard, what was the idea here? > > Hi, Richi. > > I try this codes as you suggested: > optab optab; > if (!(optab = optab_for_tree_code (code, op_vectype, optab_vector)) > || optab_handler (optab, TYPE_MODE (op_vectype)) == CODE_FOR_nothing) > return NULL; > > [jzzhong@server1:/work/home/jzzhong/work/insn]$~/work/rvv-opensource/output/ > gcc-rv64/bin/riscv64-rivai-elf-gcc -march=rv64gcv -O3 > --param=riscv-autovec-preference=scalable -S -fopt-info-vec-missed rvv.c > rvv.c:14:1: missed: couldn't vectorize loop > rvv.c:14:1: missed: not vectorized: no vectype for stmt: _4 = *_3; > scalar_type: uint16_t > > > Still can not vectorize it. Well, that means we do not have a vector mode for HImode elements?!
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #9 from JuzheZhong --- The name is correct, since the same pattern works for uint32 but fail to work for uint16 I checked the build file: CODE_FOR_vlshrrvvm1hi3 = 10350, >> Well, that means we do not have a vector mode for HImode elements?! We have vector mode for HImode. You can see CODE_FOR_vlshrrvvm1hi3, the "rvvm1hi" is vector HImode. Consider this following case: #define TEST2_TYPE(TYPE)\ __attribute__((noipa))\ void vshiftr_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = (a[i]) >> b[i]; \ } #define TEST_ALL() \ TEST2_TYPE(uint32_t) \ TEST2_TYPE(uint16_t) \ TEST_ALL() rvv.c:15:1: missed: statement clobbers memory: vect__4.9_52 = .MASK_LEN_LOAD (vectp_a.7_50, 32B, { -1, ... }, _65, 0); rvv.c:15:1: missed: statement clobbers memory: vect__6.12_56 = .MASK_LEN_LOAD (vectp_b.10_54, 32B, { -1, ... }, _65, 0); rvv.c:15:1: missed: statement clobbers memory: .MASK_LEN_STORE (vectp_dst.14_59, 32B, { -1, ... }, _65, 0, vect__8.13_57); rvv.c:15:1: missed: couldn't vectorize loop rvv.c:15:1: missed: not vectorized: no vectype for stmt: _4 = *_3; scalar_type: uint16_t uint32_t can vectorize but uint16_t fail, we have defined both vector SImode and HImode for "vlshr$a3" optab. I seems that we must support widen shift pattern in RISCV port even though we don't have widen shift instructions ?
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #10 from rsandifo at gcc dot gnu.org --- (In reply to JuzheZhong from comment #9) > I seems that we must support widen shift pattern in RISCV port even though > we don't have widen shift instructions ? I doubt it. Seems like one of those bugs where someone needs to walk through what's happening in the code, rather than relying on the debug dumps.
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #14 from Stefan Schulze Frielinghaus --- For -3 and -4 I can confirm that we do not end up with a proper comparison during combine which means we should just ignore these on Sparc. I'm currently puzzled that -5 and -6 are actually processed on Sparc (32 or 64 bit) at all. Shouldn't this: /* { dg-do compile { target { lp64 } && ! target { sparc*-*-* } } } */ prevent this?
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #11 from JuzheZhong --- I debug vectorizable_shift: Breakpoint 1, vectorizable_shift (vinfo=0x3fb45d0, stmt_info=0x3fb5ea0, gsi=0x0, vec_stmt=0x0, slp_node=0x0, cost_vec=0x7fffc648) at ../../../riscv-gcc/gcc/tree-vect-stmts.cc:6028 6028 scalar_dest = gimple_assign_lhs (stmt); (gdb) n 6029 vectype_out = STMT_VINFO_VECTYPE (stmt_info); (gdb) p scalar_dest->typed.type->type_common.mode $7 = E_HImode (gdb) call print_gimple_stmts(stdout,stmt,0,0) No symbol "print_gimple_stmts" in current context. (gdb) call print_gimple_stmt(stdout,stmt,0,0) patt_33 = _4 >> patt_34; It's odd here, we are supposed to vectorize this following codes in ifcvt dump: _5 = (int) _4; _8 = (int) _7; _9 = _5 >> _8; You can see "_9 = _5 >> _8;". We should vectorize SImode instead of HImode. The correct follow should be first extend HI -> SImode, Then vectorize logical shift right for SImode, and finally truncate SImode to HImode. Am I right? When I debug tree-vect-stmts.cc, the vectorization follow doesn't work as we want ? Thanks.
[Bug ada/110898] compilation of adacl-assert-integer.ads failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110898 --- Comment #1 from Marc Poulhiès --- I get the following error when compiling the adacl-assert-integer.ads file: ``` src/adacl-assert-integer.ads:21:10: warning: unit "GNAT.Source_Info" is not referenced [-gnatwu] src/adacl-assert-integer.ads:25:34: (style) trailing spaces not permitted [-gnatyb] src/adacl-assert-integer.ads:31:01: error: child of a generic package must be a generic unit ``` I've checked and I also get the same errors with gcc 11.x, so that's not something new. I think your code should be fixed here.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #12 from rsandifo at gcc dot gnu.org --- (In reply to JuzheZhong from comment #11) > You can see "_9 = _5 >> _8;". We should vectorize SImode instead of HImode. > The correct follow should be first extend HI -> SImode, Then vectorize > logical shift right for SImode, and finally truncate SImode to HImode. The point of vect_recog_over_widening_pattern is to avoid the extension and truncation. So this is working as expected. The question is why doing the optimisation prevents vectorisation, given that the target apparently provides HImode shifts right.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #13 from JuzheZhong --- I just checked ARM SVE has the same behavior with RISC-V: https://godbolt.org/z/vY6ecY6Mx You can see this compiler explorer. ARM trunk GCC SVE failed to vectorize it too same as RISCV wheras ARM GCC 13.1 can vectorize it.
[Bug tree-optimization/106293] [13 regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293 Jan Hubicka changed: What|Removed |Added Summary|[13/14 Regression] |[13 regression] 456.hmmer |456.hmmer at -Ofast |at -Ofast -march=native |-march=native regressed by |regressed by 19% on zen2 |19% on zen2 and zen3 in |and zen3 in July 2022 |July 2022 | --- Comment #26 from Jan Hubicka --- We are out of regression finally, but still there are several things to fix. 1) vectorizer produces corrupt profile 2) loop-split is not able to work out that it splits last iteration 3) we work way to hard optimizing loops iterating 0 times. The loop in question really iterates zero times. It is created by loop split from the internal loop: for (k = 1; k <= M; k++) { mc[k] = mpp[k-1] + tpmm[k-1]; if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc; if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc; if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc; mc[k] += ms[k]; if (mc[k] < -INFTY) mc[k] = -INFTY; dc[k] = dc[k-1] + tpdd[k-1]; if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc; if (dc[k] < -INFTY) dc[k] = -INFTY; if (k < M) { ic[k] = mpp[k] + tpmi[k]; if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc; ic[k] += is[k]; if (ic[k] < -INFTY) ic[k] = -INFTY; } it peels off the last iteration. For ocnidtion is if (k <= M) while we plit on if (k < M) M is varianble and nothing seems to be able to optimize out the second loop after splitting. My plan is to add the pattern match so loop split gets this right and records upper bound on iteration count, but first want to show other bugs exposed by this scenario.
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #14 from JuzheZhong --- (In reply to rsand...@gcc.gnu.org from comment #12) > (In reply to JuzheZhong from comment #11) > > You can see "_9 = _5 >> _8;". We should vectorize SImode instead of HImode. > > The correct follow should be first extend HI -> SImode, Then vectorize > > logical shift right for SImode, and finally truncate SImode to HImode. > The point of vect_recog_over_widening_pattern is to avoid the extension and > truncation. So this is working as expected. The question is why doing the > optimisation prevents vectorisation, given that the target apparently > provides HImode shifts right. Oh, thanks Richard. After deep analysis, I found this code make it failed: incompatible_op1_vectype_p = (op1_vectype == NULL_TREE || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype), TYPE_VECTOR_SUBPARTS (vectype)) || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)); if (incompatible_op1_vectype_p && (!slp_node || SLP_TREE_DEF_TYPE (slp_op1) != vect_constant_def || slp_op1->refcnt != 1)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "unusable type for last operand in" " vector/vector shift/rotate.\n"); return false; } incompatible_op1_vectype_p is true. The reason it becomes true is op1_vectype has the different NUNTIS with vectype. The reason why they are different NUNITS is because op1_vectype = get_vectype_for_scalar_type = RVVM1SImode. vectype = STMT_VINFO_VECTYPE (stmt_info) = RVVMF2SImode. That's the reason why they are different make it failed. As for easier understand for ARM SVE, I believe ARM sve: op1_vectype = get_vectype_for_scalar_type = VNx4SImode. vectype = STMT_VINFO_VECTYPE (stmt_info) = VNx2SImode. Then ARM SVE also failed. When revert that commit, they are the same (both are RVVM1SImode for RISCV or VNx4SImode for ARM SVE). Could you tell me how to fix that ? Thanks.
[Bug middle-end/110857] aarch64-linux-gnu profiledbootstrap broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110857 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2023-08-04 Ever confirmed|0 |1 --- Comment #4 from Jan Hubicka --- I hope the fix for x86_64 also cures arm profiledbootstrap. From backtrace it is the same bug.
[Bug tree-optimization/110838] [14 Regression] wrong code on x365-3.5, -O3, sign extraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110838 --- Comment #10 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:04aa0edcace22a7815cfc57575f1f7b1f166ac10 commit r14-2985-g04aa0edcace22a7815cfc57575f1f7b1f166ac10 Author: Richard Biener Date: Fri Aug 4 11:24:49 2023 +0200 tree-optimization/110838 - less aggressively fold out-of-bound shifts The following adjusts the shift simplification patterns to avoid touching out-of-bound shift value arithmetic right shifts of possibly negative values. While simplifying those to zero isn't wrong it's violating the principle of least surprise. PR tree-optimization/110838 * match.pd (([rl]shift @0 out-of-bounds) -> zero): Restrict the arithmetic right-shift case to non-negative operands.
[Bug middle-end/110857] aarch64-linux-gnu profiledbootstrap broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110857 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- Hi Honza, Sorry for late response, and thanks for the fix! I am currently running profiledbootstrap on aarch64 with your fix, and will let you know the results after it completes. Thanks, Prathamesh
[Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316 --- Comment #4 from CVS Commits --- The master branch has been updated by Matthew Malcomson : https://gcc.gnu.org/g:0782b01c9ea43d43648071faa9c65a101f5068a2 commit r14-2986-g0782b01c9ea43d43648071faa9c65a101f5068a2 Author: Matthew Malcomson Date: Fri Aug 4 11:26:47 2023 +0100 mid-end: Use integral time intervals in timevar.cc On some AArch64 bootstrapped builds, we were getting a flaky test because the floating point operations in `get_time` were being fused with the floating point operations in `timevar_accumulate`. This meant that the rounding behaviour of our multiplication with `ticks_to_msec` was different when used in `timer::start` and when performed in `timer::stop`. These extra inaccuracies led to the testcase `g++.dg/ext/timevar1.C` being flaky on some hardware. -- Avoiding the inlining which was agreed to be undesirable. Three alternative approaches: 1) Use `-ffp-contract=on` to avoid this particular optimisation. 2) Adjusting the code so that the "tolerance" is always of the order of a "tick". 3) Recording times and elapsed differences in integral values. - Could be in terms of a standard measurement (e.g. nanoseconds or microseconds). - Could be in terms of whatever integral value ("ticks" / secondsµseconds / "clock ticks") is returned from the syscall chosen at configure time. While `-ffp-contract=on` removes the problem that I bumped into, there has been a similar bug on x86 that was to do with a different floating point problem that also happens after `get_time` and `timevar_accumulate` both being inlined into the same function. Hence it seems worth choosing a different approach. Of the two other solutions, recording measurements in integral values seems the most robust against slightly "off" measurements being presented to the user -- even though it could avoid the ICE that creates a flaky test. I considered storing time in whatever units our syscall returns and normalising them at the time we print out rather than normalising them to nanoseconds at the point we record our "current time". The logic being that normalisation could have some rounding affect (e.g. if TICKS_PER_SECOND is 3) that would be taken into account in calculations. I decided against it in order to give the values recorded in `timevar_time_def` some interpretive value so it's easier to read the code. Compared to the small rounding that would represent a tiny amount of time and AIUI can not trigger the same kind of ICE's as we are attempting to fix, said interpretive value seems more valuable. Recording time in microseconds seemed reasonable since all obvious values for ticks and `getrusage` are at microsecond granularity or less precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both variables given to use by the host system I was not sure of that enough to make this decision. -- timer::all_zero is ignoring rows which are inconsequential to the user and would be printed out as all zeros. Since upon printing rows we convert to the same double value and print out the same precision as before, we return true/false based on the same amount of time as before. timer::print_row casts to a floating point measurement in units of seconds as was printed out before. timer::validate_phases -- I'm printing out nanoseconds here rather than floating point seconds since this is an error message for when things have "gone wrong" printing out the actual nanoseconds that have been recorded seems like the best approach. N.b. since we now print out nanoseconds instead of floating point value the padding requirements are different. Originally we were padding to 24 characters and printing 18 decimal places. This looked odd with the now visually smaller values getting printed. I judged 13 characters (corresponding to 2 hours) to be a reasonable point at which our alignment could start to degrade and this provides a more compact output for the majority of cases (checked by triggering the error case via GDB). -- N.b. I use a literal 10 for "NANOSEC_PER_SEC". I believe this would fit in an integer on all hosts that GCC supports, but am not certain there are not strange integer sizes we support hence am pointing it out for special attention during review. -- No expected change in generated code. Bootstrapped and regtested on AArch64 with no regressions. Hope this is acceptable -- I had originally planned to use `-ffp-contract` as agreed until I saw mention of the old x86 bug in the same area which was not to do with flo
[Bug c/9903] [3.2 regression] ICE for legal code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9903 --- Comment #3 from CVS Commits --- The master branch has been updated by Matthew Malcomson : https://gcc.gnu.org/g:0782b01c9ea43d43648071faa9c65a101f5068a2 commit r14-2986-g0782b01c9ea43d43648071faa9c65a101f5068a2 Author: Matthew Malcomson Date: Fri Aug 4 11:26:47 2023 +0100 mid-end: Use integral time intervals in timevar.cc On some AArch64 bootstrapped builds, we were getting a flaky test because the floating point operations in `get_time` were being fused with the floating point operations in `timevar_accumulate`. This meant that the rounding behaviour of our multiplication with `ticks_to_msec` was different when used in `timer::start` and when performed in `timer::stop`. These extra inaccuracies led to the testcase `g++.dg/ext/timevar1.C` being flaky on some hardware. -- Avoiding the inlining which was agreed to be undesirable. Three alternative approaches: 1) Use `-ffp-contract=on` to avoid this particular optimisation. 2) Adjusting the code so that the "tolerance" is always of the order of a "tick". 3) Recording times and elapsed differences in integral values. - Could be in terms of a standard measurement (e.g. nanoseconds or microseconds). - Could be in terms of whatever integral value ("ticks" / secondsµseconds / "clock ticks") is returned from the syscall chosen at configure time. While `-ffp-contract=on` removes the problem that I bumped into, there has been a similar bug on x86 that was to do with a different floating point problem that also happens after `get_time` and `timevar_accumulate` both being inlined into the same function. Hence it seems worth choosing a different approach. Of the two other solutions, recording measurements in integral values seems the most robust against slightly "off" measurements being presented to the user -- even though it could avoid the ICE that creates a flaky test. I considered storing time in whatever units our syscall returns and normalising them at the time we print out rather than normalising them to nanoseconds at the point we record our "current time". The logic being that normalisation could have some rounding affect (e.g. if TICKS_PER_SECOND is 3) that would be taken into account in calculations. I decided against it in order to give the values recorded in `timevar_time_def` some interpretive value so it's easier to read the code. Compared to the small rounding that would represent a tiny amount of time and AIUI can not trigger the same kind of ICE's as we are attempting to fix, said interpretive value seems more valuable. Recording time in microseconds seemed reasonable since all obvious values for ticks and `getrusage` are at microsecond granularity or less precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both variables given to use by the host system I was not sure of that enough to make this decision. -- timer::all_zero is ignoring rows which are inconsequential to the user and would be printed out as all zeros. Since upon printing rows we convert to the same double value and print out the same precision as before, we return true/false based on the same amount of time as before. timer::print_row casts to a floating point measurement in units of seconds as was printed out before. timer::validate_phases -- I'm printing out nanoseconds here rather than floating point seconds since this is an error message for when things have "gone wrong" printing out the actual nanoseconds that have been recorded seems like the best approach. N.b. since we now print out nanoseconds instead of floating point value the padding requirements are different. Originally we were padding to 24 characters and printing 18 decimal places. This looked odd with the now visually smaller values getting printed. I judged 13 characters (corresponding to 2 hours) to be a reasonable point at which our alignment could start to degrade and this provides a more compact output for the majority of cases (checked by triggering the error case via GDB). -- N.b. I use a literal 10 for "NANOSEC_PER_SEC". I believe this would fit in an integer on all hosts that GCC supports, but am not certain there are not strange integer sizes we support hence am pointing it out for special attention during review. -- No expected change in generated code. Bootstrapped and regtested on AArch64 with no regressions. Hope this is acceptable -- I had originally planned to use `-ffp-contract` as agreed until I saw mention of the old x86 bug in the same area which was not to do with float
[Bug tree-optimization/110838] [14 Regression] wrong code on x365-3.5, -O3, sign extraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110838 --- Comment #11 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:1a599caab86464006ea8c9501aff6c6638e891eb commit r14-2987-g1a599caab86464006ea8c9501aff6c6638e891eb Author: Richard Biener Date: Fri Aug 4 12:11:45 2023 +0200 tree-optimization/110838 - vectorization of widened right shifts The following fixes a problem with my last attempt of avoiding out-of-bound shift values for vectorized right shifts of widened operands. Instead of truncating the shift amount with a bitwise and we actually need to saturate it to the target precision. The following does that and adds test coverage for the constant and invariant but variable case that would previously have failed. PR tree-optimization/110838 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Fix right-shift value sanitizing. Properly emit external def mangling in the preheader rather than in the pattern def sequence where it will fail vectorizing. * gcc.dg/vect/pr110838.c: New testcase.
[Bug target/110066] [13 Regression] [RISC-V] Segment fault if compiled with -static -pg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110066 --- Comment #25 from Aurelien Jarno --- (In reply to Andrew Pinski from comment #23) > Fixed on the trunk will backport to GCC 13 after 13.2.0 is released (since > the branch is frozen except for RM approvals). Now that GCC 13.2.0 has been released, would it be possible to backport the fix, please?
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #15 from Richard Biener --- Well, the question is why we arrive here with the two different vector types. Can you tell me a relevant cc1 compiler command like for a x86->riscv cross that exposes the issue?
[Bug sanitizer/81981] [8 Regression] -fsanitize=undefined makes a -Wmaybe-uninitialized warning disappear
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81981 --- Comment #9 from Vincent Lefèvre --- Note, however, that there is a small regression in GCC 11: the warning for t is output as expected, but if -fsanitize=undefined is given, the message for t is suboptimal, saying "*&t[0]" instead of "t[0]": zira:~> gcc-11 -Wmaybe-uninitialized -O2 -c tst.c -fsanitize=undefined tst.c: In function ‘foo’: tst.c:12:15: warning: ‘*&t[0]’ may be used uninitialized in this function [-Wmaybe-uninitialized] 12 | return t[0] + u[0]; | ~^~ tst.c:12:15: warning: ‘u[0]’ may be used uninitialized in this function [-Wmaybe-uninitialized] No such issue without -fsanitize=undefined: zira:~> gcc-11 -Wmaybe-uninitialized -O2 -c tst.c tst.c: In function ‘foo’: tst.c:12:15: warning: ‘u[0]’ may be used uninitialized in this function [-Wmaybe-uninitialized] 12 | return t[0] + u[0]; | ~^~ tst.c:12:15: warning: ‘t[0]’ may be used uninitialized in this function [-Wmaybe-uninitialized] It is impossible to say whether this is fixed in GCC 12 and later, because of PR 110896, i.e. the warning is always missing.
[Bug modula2/110779] SysClock can not read the clock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110779 Gaius Mulley changed: What|Removed |Added Attachment #55683|0 |1 is obsolete|| --- Comment #3 from Gaius Mulley --- Created attachment 55687 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55687&action=edit Proposed fix v2 The previous patch was missing some new files. This has successfully bootstrapped on x86_64 and aarch64. I'd like to see it bootstrap on ppc64le, x86_32 and armv7l before it is git committed (as the libgm2 automake {Makefile.in, configure, config.h} have been regenerated).
[Bug c++/110848] Consider enabling -Wvla by default in non-GNU C++ modes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110848 --- Comment #12 from Aaron Ballman --- (In reply to Eric Gallager from comment #11) > How about: > > -std=c++XY: enabled by default (as per the proposal) > -std=gnu++XY: enabled by -Wall and/or -Wextra (in addition to being enabled > by -pedantic like it already is) That's a good suggestion -- I'd be quite happy with adding it to -Wall (or barring that, -Wextra) in GNU++ modes.
[Bug target/106346] [11/12/13/14 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346 --- Comment #8 from CVS Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:451391a6477f5b012faeca42cdba1bfb8e6eecc0 commit r14-2991-g451391a6477f5b012faeca42cdba1bfb8e6eecc0 Author: Tamar Christina Date: Fri Aug 4 13:49:23 2023 +0100 AArch64: Undo vec_widen_shiftl optabs [PR106346] In GCC 11 we implemented the vectorizer optab for widening left shifts, however this optab is only supported for uniform shift constants. At the moment GCC still has two loop vectorization strategy (classical loop and SLP based loop vec) and the optab is implemented as a scalar pattern. This means that when we apply it to a non-uniform constant inside a loop we only find out during SLP build that the constants aren't uniform. At this point it's too late and we lose SLP entirely. Over the years I've tried various options but none of it works well: 1. Dissolving patterns during SLP built (problematic, also dissolves them for non-slp). 2. Optionally ignoring patterns for SLP build (problematic, ends up interfearing with relevancy detection). 3. Relaxing contraint on SLP build to allow non-constant values and dissolving them after SLP build using an SLP pattern. (problematic, ends up breaking shift reassociation). As a result we've concluded that for now this pattern should just be removed and formed during RTL. The plan is to move this to an SLP only pattern once we remove classical loop vectorization support from GCC, at which time we can also properly support SVE's Top and Bottom variants. This removes the optab and reworks the RTL to recognize both the vector variant and the intrinsics variant. Also just simplifies all these patterns. gcc/ChangeLog: PR target/106346 * config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_, vec_widen_shiftl_hi_): Remove. (aarch64_shll_internal): Renamed to... (aarch64_shll): .. This. (aarch64_shll2_internal): Renamed to... (aarch64_shll2): .. This. (aarch64_shll_n, aarch64_shll2_n): Re-use new optabs. * config/aarch64/constraints.md (D2, DL): New. * config/aarch64/predicates.md (aarch64_simd_shll_imm_vec): New. gcc/testsuite/ChangeLog: PR target/106346 * gcc.target/aarch64/pr98772.c: Adjust assembly. * gcc.target/aarch64/vect-widen-shift.c: New test.
[Bug target/110899] New: RFE: Attributes preserve_most and preserve_all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899 Bug ID: 110899 Summary: RFE: Attributes preserve_most and preserve_all Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: elver at google dot com Target Milestone: --- Clang/LLVM implements the function attributes "preserve_most" and "preserve_all": [1] preserve_most: "On X86-64 and AArch64 targets, this attribute changes the calling convention of a function. The preserve_most calling convention attempts to make the code in the caller as unintrusive as possible. This convention behaves identically to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This alleviates the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn’t apply for values returned in callee-saved registers. - On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Floating-point registers (XMMs/YMMs) are not preserved and need to be saved by the caller. - On AArch64 the callee preserve all general purpose registers, except X0-X8 and X16-X18." [2] preserve_all: "On X86-64 and AArch64 targets, this attribute changes the calling convention of a function. The preserve_all calling convention attempts to make the code in the caller even less intrusive than the preserve_most calling convention. This calling convention also behaves identical to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This removes the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn’t apply for values returned in callee-saved registers. - On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Furthermore it also preserves all floating-point registers (XMMs/YMMs). - On AArch64 the callee preserve all general purpose registers, except X0-X8 and X16-X18. Furthermore it also preserves lower 128 bits of V8-V31 SIMD - floating point registers." [1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most [2] https://clang.llvm.org/docs/AttributeReference.html#preserve-all These attributes, esp. preserve_most, provides a convenient way to optimize the generated code for calls to rarely taken slow paths, such as error-reporting functions. Recently, we're looking to make use of this in the Linux kernel [3], with potentially additional usecases being discussed. [3] https://lkml.kernel.org/r/20230804090621.400-1-el...@google.com
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 106346, which changed state. Bug 106346 Summary: [11/12/13/14 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/106346] [11/12/13/14 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Tamar Christina --- Fixed in GCC 14.
[Bug c/108986] [11 Regression] Incorrect warning for [static] array parameter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108986 Martin Uecker changed: What|Removed |Added CC||muecker at gwdg dot de --- Comment #10 from Martin Uecker --- PATCH: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625559.html
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #15 from Stefan Schulze Frielinghaus --- Created attachment 55688 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55688&action=edit Increase optimization and skip sparc for 4-6
[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869 --- Comment #16 from Stefan Schulze Frielinghaus --- Turns out that my dejagnu foo is weak ;-) I came up with a wrong target selector. Should be fixed in the new attachment.
[Bug target/110899] RFE: Attributes preserve_most and preserve_all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/110897] RISC-V: Fail to vectorize shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110897 --- Comment #16 from JuzheZhong --- (In reply to Richard Biener from comment #15) > Well, the question is why we arrive here with the two different vector types. > Can you tell me a relevant cc1 compiler command like for a x86->riscv cross > that exposes the issue? Thanks for taking care of this issue. The RISC-V cc1 command: cc1 -march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=scalable For ARM SVE: -march=armv8-a+sve -O3 This issue is exposed in both RISC-V and ARM. code: #include #define TEST2_TYPE(TYPE)\ __attribute__((noipa))\ void vshiftr_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = (a[i]) >> b[i]; \ } #define TEST_ALL() \ TEST2_TYPE(uint16_t) \ TEST_ALL()
[Bug middle-end/88873] missing vectorization for decomposed operations on a vector type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873 --- Comment #10 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:faa2202ee7fcf039b2016ce5766a2927526c5f78 commit r14-2997-gfaa2202ee7fcf039b2016ce5766a2927526c5f78 Author: Roger Sayle Date: Fri Aug 4 16:23:38 2023 +0100 i386: Split SUBREGs of SSE vector registers into vec_select insns. This patch is the final piece in the series to improve the ABI issues affecting PR 88873. The previous patches tackled inserting DFmode values into V2DFmode registers, by introducing insvti_{low,high}part patterns. This patch improves the extraction of DFmode values from V2DFmode registers via TImode intermediates. I'd initially thought this would require new extvti_{low,high}part patterns to be defined, but all that's required is to recognize that the SUBREG idioms produced by combine are equivalent to (forms of) vec_select patterns. The target-independent middle-end can't be sure that the appropriate vec_select instruction exists on the target, hence doesn't canonicalize a SUBREG of a vector mode as a vec_select, but the backend can provide a define_split stating where and when this is useful, for example, considering whether the operand is in memory, or whether !TARGET_SSE_MATH and the destination is i387. For pr88873.c, gcc -O2 -march=cascadelake currently generates: foo:vpunpcklqdq %xmm3, %xmm2, %xmm7 vpunpcklqdq %xmm1, %xmm0, %xmm6 vpunpcklqdq %xmm5, %xmm4, %xmm2 vmovdqa %xmm7, -24(%rsp) vmovdqa %xmm6, %xmm1 movq-16(%rsp), %rax vpinsrq $1, %rax, %xmm7, %xmm4 vmovapd %xmm4, %xmm6 vfmadd132pd %xmm1, %xmm2, %xmm6 vmovapd %xmm6, -24(%rsp) vmovsd -16(%rsp), %xmm1 vmovsd -24(%rsp), %xmm0 ret with this patch, we now generate: foo:vpunpcklqdq %xmm1, %xmm0, %xmm6 vpunpcklqdq %xmm3, %xmm2, %xmm7 vpunpcklqdq %xmm5, %xmm4, %xmm2 vmovdqa %xmm6, %xmm1 vfmadd132pd %xmm7, %xmm2, %xmm1 vmovsd %xmm1, %xmm1, %xmm0 vunpckhpd %xmm1, %xmm1, %xmm1 ret The improvement is even more dramatic when compared to the original 29 instructions shown in comment #8. GCC 13, for example, required 12 transfers to/from memory. 2023-08-04 Roger Sayle gcc/ChangeLog * config/i386/sse.md (define_split): Convert highpart:DF extract from V2DFmode register into a sse2_storehpd instruction. (define_split): Likewise, convert lowpart:DF extract from V2DF register into a sse2_storelpd instruction. gcc/testsuite/ChangeLog * gcc.target/i386/pr88873.c: Tweak to check for improved code.
[Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717 --- Comment #15 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:c572f09a751cbd365e2285b30527de5ab9025972 commit r14-2998-gc572f09a751cbd365e2285b30527de5ab9025972 Author: Roger Sayle Date: Fri Aug 4 16:26:06 2023 +0100 Specify signed/unsigned/dontcare in calls to extract_bit_field_1. This patch is inspired by Jakub's work on PR rtl-optimization/110717. The bitfield example described in comment #2, looks like: struct S { __int128 a : 69; }; unsigned type bar (struct S *p) { return p->a; } which on x86_64 with -O2 currently generates: bar:movzbl 8(%rdi), %ecx movq(%rdi), %rax andl$31, %ecx movq%rcx, %rdx salq$59, %rdx sarq$59, %rdx ret The ANDL $31 is interesting... we first extract an unsigned 69-bit bitfield by masking/clearing the top bits of the most significant word, and then it gets sign-extended, by left shifting and arithmetic right shifting. Obviously, this bit-wise AND is redundant, for signed bit-fields, we don't require these bits to be cleared, if we're about to set them appropriately. This patch eliminates this redundancy in the middle-end, during RTL expansion, but extending the extract_bit_field APIs so that the integer UNSIGNEDP argument takes a special value; 0 indicates the field should be sign extended, 1 (any non-zero value) indicates the field should be zero extended, but -1 indicates a third option, that we don't care how or whether the field is extended. By passing and checking this sentinel value at the appropriate places we avoid the useless bit masking (on all targets). For the test case above, with this patch we now generate: bar:movzbl 8(%rdi), %ecx movq(%rdi), %rax movq%rcx, %rdx salq$59, %rdx sarq$59, %rdx ret 2023-08-04 Roger Sayle gcc/ChangeLog * expmed.cc (extract_bit_field_1): Document that an UNSIGNEDP value of -1 is equivalent to don't care. (extract_integral_bit_field): Indicate that we don't require the most significant word to be zero extended, if we're about to sign extend it. (extract_fixed_bit_field_1): Document that an UNSIGNEDP value of -1 is equivalent to don't care. Don't clear the most significant bits with AND mask when UNSIGNEDP is -1. gcc/testsuite/ChangeLog * gcc.target/i386/pr110717-2.c: New test case.
[Bug c++/110900] New: std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 Bug ID: 110900 Summary: std::string initializes SSO object subfield without making the SSO object active in the union Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: danakj at orodu dot net Target Milestone: --- Specific errors by clang: note: construction of subobject of member '_M_local_buf' of union with no active member is not allowed in a constant expression error: accessing ‘std::__cxx11::basic_string_M_allocated_capacity’ member instead of initialized ‘std::__cxx11::basic_string_M_local_buf’ member in constant expression Specific error by GCC: error: accessing ‘std::__cxx11::basic_string_M_allocated_capacity’ member instead of initialized ‘std::__cxx11::basic_string_M_local_buf’ member in constant expression Full errors: Here's the clang 17 error: /usr/include/c++/12/bits/stl_construct.h:97:14: note: construction of subobject of member '_M_local_buf' of union with no active member is not allowed in a constant expression 97 | { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); } | ^ /usr/include/c++/12/bits/char_traits.h:262:6: note: in call to 'construct_at(&[]() { std::string acc; sus::Array::with('a', 'b', 'c', 'd', 'e').into_iter().for_each([&](char v) { acc.push_back(v); }); return acc; }().._M_local_buf[0], acc.._M_local_buf[0])' 262 | std::construct_at(__s1 + __i, __s2[__i]); | ^ /usr/include/c++/12/bits/char_traits.h:429:11: note: in call to 'copy(&[]() { std::string acc; sus::Array::with('a', 'b', 'c', 'd', 'e').into_iter().for_each([&](char v) { acc.push_back(v); }); return acc; }().._M_local_buf[0], &acc.._M_local_buf[0], 6)' 429 | return __gnu_cxx::char_traits::copy(__s1, __s2, __n); | ^ /usr/include/c++/12/bits/basic_string.h:675:6: note: in call to 'copy(&[]() { std::string acc; sus::Array::with('a', 'b', 'c', 'd', 'e').into_iter().for_each([&](char v) { acc.push_back(v); }); return acc; }().._M_local_buf[0], &acc.._M_local_buf[0], 6)' 675 | traits_type::copy(_M_local_buf, __str._M_local_buf, | ^ /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1864:12: note: in call to 'basic_string(acc)' 1864 | return acc; |^ /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1859:17: note: in call to '[]() { std::string acc; sus::Array::with('a', 'b', 'c', 'd', 'e').into_iter().for_each([&](char v) { acc.push_back(v); }); return acc; }.operator()()' 1859 | static_assert([]() { | ^ Here's the g++ 13 error: /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1792:24: error: non-constant condition for static assertion 1787 | static_assert(sus::Array::with('a', 'b', 'c', 'd', 'e') | ~~ 1788 | .into_iter() | 1789 | .fold(std::string(), [](std::string acc, char v) { | ~~ 1790 | acc.push_back(v); | ~ 1791 | return acc; | ~~~ 1792 | }) == "abcde"); | ~~~^~ /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1792:35: in ‘constexpr’ expansion of ‘sus::iter::IteratorBase::fold(B, F) && [with B = std::__cxx11::basic_string; F = {anonymous}::Iterator_Fold_Test::TestBody()::; Iter = sus::containers::ArrayIntoIter; ItemT = char](std::__cxx11::basic_string(), ({anonymous}::Iterator_Fold_Test::TestBody()::(), {anonymous}::Iterator_Fold_Test::TestBody()::()))’ /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1792:35: in ‘constexpr’ expansion of ‘sus::fn::call_mut(F&&, Args&& ...) [with F = {anonymous}::Iterator_Fold_Test::TestBody()::&; Args = {std::__cxx11::basic_string, std::allocator >, char}]((* & sus::mem::move&>(init)), (& sus::mem::move&>(o))->sus::option::Option::unwrap())’ /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1792:35: in ‘constexpr’ expansion of ‘std::invoke(_Callable&&, _Args&& ...) [with _Callable = {anonymous}::Iterator_Fold_Test::TestBody()::&; _Args = {__cxx11::basic_string, allocator >, char}; invoke_result_t<_Fn, _Args ...> = __cxx11::basic_string]((* & sus::mem::forward >((* & args#0))), (* & sus::mem::forward((* & args#1’ /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc:1792:35: in ‘const
[Bug middle-end/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888 Thomas Koenig changed: What|Removed |Added Component|fortran |middle-end --- Comment #3 from Thomas Koenig --- Interesting problem. For _19 = (*x_13(D))[0]; _20 = (*y_14(D))[0]; _21 = _19 * _20; _22 = _21 + 0.0; the multiplication cannot produce a signalling NaN, so the addition of zero should always be a no-op. For this, a simpler test case would be double add(double a, double b) { return a*b + 0.0; } which gets me, on x86_64, mulsd %xmm1, %xmm0 pxor%xmm1, %xmm1 addsd %xmm1, %xmm0 re According to godbolt, icc produces add: mulsd %xmm1, %xmm0 #3.12 ret which should be fine. So, an issue for tree optimization?
[Bug c++/110900] std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 --- Comment #1 from danakj at orodu dot net --- I am going to try work around this by not using std::string in constant expressions.. So in the meantime I pushed a branch where this bug will continue to reproduce. With gcc-13: git clone --recurse-submodules https://github.com/danakj/subspace cd subspace git checkout test origin/libstd-bug-sso CXX=path/to/gcc-13 cmake -B out -DSUBSPACE_BUILD_TESTS=ON cmake --build out -j 20
[Bug fortran/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888 Thomas Koenig changed: What|Removed |Added Component|middle-end |fortran --- Comment #4 from Thomas Koenig --- Hm, on second thoughts, signed zeros are an issue, resetting to Fortran. Generally, we are in an intrinsic, so we can do whatever we please (we certainly do in the library case, and this is expected behavior). Having -ffast-math applied locally to the BLOCK that the matmul is executed in would be a possibility.
[Bug target/109465] LoongArch: The expansion of memcpy is slow and bloated for some sizes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109465 Xi Ruoyao changed: What|Removed |Added Target Milestone|--- |14.0
[Bug c++/110900] std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |WAITING Last reconfirmed||2023-08-04 --- Comment #2 from Andrew Pinski --- Can you please read https://gcc.gnu.org/bugs/ on what we need?
[Bug c++/110158] Cannot use union with std::string inside in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110158 Andrew Pinski changed: What|Removed |Added CC||danakj at orodu dot net --- Comment #3 from Andrew Pinski --- *** Bug 110900 has been marked as a duplicate of this bug. ***
[Bug libstdc++/110900] std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 Andrew Pinski changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Andrew Pinski --- Dup of bug 110158. *** This bug has been marked as a duplicate of bug 110158 ***
[Bug libstdc++/110900] std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 --- Comment #4 from danakj at orodu dot net --- The error message is the same as 110158 but to be clear the std::string is not in a union. The error message is about the union _inside_ std::string.
[Bug libstdc++/110900] std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 --- Comment #5 from danakj at orodu dot net --- > Can you please read https://gcc.gnu.org/bugs/ on what we need? Yeah, sorry I can't reproduce this locally on my Mac or Windows machine. It reproduces on github Linux CI bots, and I have diagnosed it from there. https://github.com/chromium/subspace/actions/runs/5758764036/job/15611774084?pr=306 This job is using gcc 13.1.0, and it installs libstdc++-13-dev. Here's the command that fails: /usr/bin/g++-13 -I/home/runner/work/subspace/subspace -I/home/runner/work/subspace/subspace/third_party/googletest -I/home/runner/work/subspace/subspace/third_party/fmt/include -isystem /home/runner/work/subspace/subspace/third_party/googletest/googletest/include -isystem /home/runner/work/subspace/subspace/third_party/googletest/googletest -isystem /usr/include/c++/13 -isystem /usr/include/x86_64-linux-gnu/c++/13 -isystem /usr/include/c++/13/backward -isystem /usr/lib/gcc/x86_64-linux-gnu/13/include -isystem /usr/local/include -isystem /usr/include/x86_64-linux-gnu -isystem /usr/include -O3 -DNDEBUG -std=gnu++20 -fno-rtti -Werror -MD -MT sus/CMakeFiles/subspace_unittests.dir/iter/iterator_unittest.cc.o -MF sus/CMakeFiles/subspace_unittests.dir/iter/iterator_unittest.cc.o.d -o sus/CMakeFiles/subspace_unittests.dir/iter/iterator_unittest.cc.o -c /home/runner/work/subspace/subspace/sus/iter/iterator_unittest.cc I think it's simplest to just do a git clone and build that though... as I can't easily minmize this.
[Bug c++/110158] Cannot use union with std::string inside in constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110158 --- Comment #4 from danakj at orodu dot net --- Here's a repro without the std::string inside a union. It is the SSO union inside the string that causes the error. https://gcc.godbolt.org/z/T8oM8vYnq ``` #include template constexpr T fold(T init, I i, S s, F f) { while (true) { if (i == s) return init; else init = f(std::move(init), *i++); } } constexpr char v[] = {'a', 'b', 'c'}; static_assert(fold(std::string(), std::begin(v), std::end(v), [](std::string acc, char v) { acc.push_back(v); return acc; }) == "abc"); int main() {} ``` :18:23: error: non-constant condition for static assertion 14 | static_assert(fold(std::string(), std::begin(v), std::end(v), | ~~~ 15 |[](std::string acc, char v) { |~ 16 |acc.push_back(v); |~ 17 |return acc; |~~~ 18 |}) == "abc"); |~~~^~~~ :18:32: in 'constexpr' expansion of 'fold(T, I, S, F) [with T = std::__cxx11::basic_string; I = const char*; S = const char*; F = ](std::begin(v), std::end(v), ((), ()))' :18:32: in 'constexpr' expansion of 'std::__cxx11::basic_string((* & std::move<__cxx11::basic_string&>(init)))' :18:23: error: accessing 'std::__cxx11::basic_string_M_allocated_capacity' member instead of initialized 'std::__cxx11::basic_string_M_local_buf' member in constant expression ASM generation compiler returned: 1
[Bug libstdc++/110900] std::string initializes SSO object subfield without making the SSO object active in the union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 --- Comment #6 from danakj at orodu dot net --- Thanks for the link, I used the godbolt from that bug to set up the right environment and that let me minimize it. I posted it into the dupe bug.
[Bug driver/110901] New: -march does not override -mcpu on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901 Bug ID: 110901 Summary: -march does not override -mcpu on aarch64 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: raj.khem at gmail dot com Target Milestone: --- As per https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mcpu When -march is used then relevant part of -mcpu are overridden by that. However this seems to be not happening in following case with GCC13 a.s === .text ptrue p0.b = aarch64-yoe-linux-gcc -mcpu=cortex-a72.cortex-a53 -mbranch-protection=standard --sysroot=/mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot -fuse-ld=bfd -c -march=armv8.2-a+sve a.s -v Using built-in specs. COLLECT_GCC=../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-gcc Target: aarch64-yoe-linux Configured with: ../../../../../../work-shared/gcc-13.2.0-r0/gcc-13.2.0/configure --build=x86_64-linux --host=x86_64-linux --target=aarch64-yoe-linux --prefix=/host-native/usr --exec_prefix=/host-native/usr --bindir=/host-native/usr/bin/aarch64-yoe-linux --sbindir=/host-native/usr/bin/aarch64-yoe-linux --libexecdir=/host-native/usr/libexec/aarch64-yoe-linux --datadir=/host-native/usr/share --sysconfdir=/host-native/etc --sharedstatedir=/host-native/com --localstatedir=/host-native/var --libdir=/host-native/usr/lib/aarch64-yoe-linux --includedir=/host-native/usr/include --oldincludedir=/host-native/usr/include --infodir=/host-native/usr/share/info --mandir=/host-native/usr/share/man --disable-silent-rules --disable-dependency-tracking --with-libtool-sysroot=/host-native --enable-clocale=generic --with-gnu-ld --enable-shared --enable-languages=c,c++ --enable-threads=posix --disable-multilib --enable-default-pie --enable-c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-prefix=aarch64-yoe-linux- --without-local-prefix --disable-install-libiberty --disable-libssp --enable-libitm --enable-lto --disable-bootstrap --with-system-zlib --with-linker-hash-style=sysv --enable-linker-build-id --with-ppl=no --with-cloog=no --enable-checking=release --enable-cheaders=c_global --without-isl --with-gxx-include-dir=/not/exist/usr/include/c++/13.2.0 --with-sysroot=/not/exist --with-build-sysroot=/host --enable-poison-system-directories=error --with-system-zlib --disable-static --disable-nls --with-glibc-version=2.28 --enable-initfini-array --enable-__cxa_atexit Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (GCC) COLLECT_GCC_OPTIONS='-mcpu=cortex-a72.cortex-a53' '-mbranch-protection=standard' '--sysroot=/mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot' '-fuse-ld=bfd' '-c' '-march=armv8.2-a+sve' '-v' '-mlittle-endian' '-mabi=lp64' /mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot-native/usr/bin/aarch64-yoe-linux/../../libexec/aarch64-yoe-linux/gcc/aarch64-yoe-linux/13.2.0/as -v -EL -march=armv8.2-a+sve -march=armv8-a+crc -mabi=lp64 -o a.o a.s GNU assembler version 2.41.0 (aarch64-yoe-linux) using BFD version (GNU Binutils) 2.41.0.20230731 a.s: Assembler messages: a.s:2: Error: selected processor does not support `ptrue p0.b' However if I remove -mcpu=cortex-a72.cortex-a53 or change it to -mcpu=cortex-a72.cortex-a53+sve then it works ok. Interesting part is -march values in the assembler commandline order. as -v -EL -march=armv8.2-a+sve -march=armv8-a+crc -mabi=lp64 -o a.o a.s as we can see the -march computed from -mcpu is specified *after* the -march passed by user. is this a bug?
[Bug target/110202] _mm512_ternarylogic_epi64 generates unnecessary operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 --- Comment #11 from CVS Commits --- The master branch has been updated by Alexander Monakov : https://gcc.gnu.org/g:567d06bb357a39ece865cef67ada44124f227e45 commit r14-2999-g567d06bb357a39ece865cef67ada44124f227e45 Author: Yan Simonaytes Date: Tue Jul 25 20:43:19 2023 +0300 i386: eliminate redundant operands of VPTERNLOG As mentioned in PR 110202, GCC may be presented with input where control word of the VPTERNLOG intrinsic implies that some of its operands do not affect the result. In that case, we can eliminate redundant operands of the instruction by substituting any other operand in their place. This removes false dependencies. For instance, instead of (252 = 0xfc = _MM_TERNLOG_A | _MM_TERNLOG_B) vpternlogq $252, %zmm2, %zmm1, %zmm0 emit vpternlogq $252, %zmm0, %zmm1, %zmm0 When VPTERNLOG is invariant w.r.t first and second operands, and the third operand is memory, load memory into the output operand first, i.e. instead of (85 = 0x55 = ~_MM_TERNLOG_C) vpternlogq $85, (%rdi), %zmm1, %zmm0 emit vmovdqa64 (%rdi), %zmm0 vpternlogq $85, %zmm0, %zmm0, %zmm0 gcc/ChangeLog: PR target/110202 * config/i386/i386-protos.h (vpternlog_redundant_operand_mask): Declare. (substitute_vpternlog_operands): Declare. * config/i386/i386.cc (vpternlog_redundant_operand_mask): New helper. (substitute_vpternlog_operands): New function. Use them... * config/i386/sse.md: ... here in new VPTERNLOG define_splits. gcc/testsuite/ChangeLog: PR target/110202 * gcc.target/i386/invariant-ternlog-1.c: New test. * gcc.target/i386/invariant-ternlog-2.c: New test.
[Bug analyzer/110902] New: Missing cast in region_model_manager::maybe_fold_binop on MULT_EXPR by 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110902 Bug ID: 110902 Summary: Missing cast in region_model_manager::maybe_fold_binop on MULT_EXPR by 1 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: dmalcolm at gcc dot gnu.org Target Milestone: --- Whilst trying to fix PR analyzer/110426, I noticed that region_model_manager::maybe_fold_binop doesn't always return the correct type; specifically, it fails to cast to TYPE when folding (VAL * 1) -> VAL: diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc index 46d271a295c..010906f1ec0 100644 --- a/gcc/analyzer/region-model-manager.cc +++ b/gcc/analyzer/region-model-manager.cc @@ -654,7 +654,7 @@ region_model_manager::maybe_fold_binop (tree type, enum tree_code op, return get_or_create_constant_svalue (build_int_cst (type, 0)); /* (VAL * 1) -> VAL. */ if (cst1 && integer_onep (cst1)) - return arg0; + return get_or_create_cast (type, arg0); break; case BIT_AND_EXPR: if (cst1) However, on adding the above cast, various bounds-checking tests fail, seemingly due to confusion about ptrdiff_t vs size_t, and how to compare such values: FAIL: gcc.dg/analyzer/flexible-array-member-1.c (test for warnings, line 96) With -m64: FAIL: gcc.dg/analyzer/out-of-bounds-diagram-3.c (test for warnings, line 19) FAIL: gcc.dg/analyzer/out-of-bounds-diagram-3.c (test for warnings, line 24) FAIL: gcc.dg/analyzer/out-of-bounds-diagram-3.c expected multiline pattern lines 29-44
[Bug target/110901] -march does not override -mcpu on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901 --- Comment #1 from Andrew Pinski --- Order matters. In this case -march is after -mcpu ...
[Bug target/110901] -march does not override -mcpu on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901 --- Comment #2 from Khem Raj --- (In reply to Andrew Pinski from comment #1) > Order matters. In this case -march is after -mcpu ... It does not seem to be effective in this case. I tried to specify -mcpu after -march and vice-versa, result is same % ../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-gcc -mbranch-protection=standard --sysroot=/mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot -fuse-ld=bfd -c -march=armv8.2-a+sve -mcpu=cortex-a72.cortex-a53 a.s a.s: Assembler messages: a.s:2: Error: selected processor does not support `ptrue p0.b' ../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-gcc -mbranch-protection=standard --sysroot=/mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot -fuse-ld=bfd -c -mcpu=cortex-a72.cortex-a53 -march=armv8.2-a+sve a.s a.s: Assembler messages: a.s:2: Error: selected processor does not support `ptrue p0.b'
[Bug target/110901] -march does not override -mcpu (big.little on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901 --- Comment #3 from Andrew Pinski --- With C code, these use of -march and -mcpu would normally be rejected even.
[Bug ada/110898] compilation of adacl-assert-integer.ads failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110898 Martin Krischik changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Martin Krischik --- @(In reply to Marc Poulhiès from comment #1) > I've checked and I also get the same errors with gcc 11.x, so that's not > something new. I think your code should be fixed here. Yes, those error messages make sense. Especially the „error: child of a generic package must be a generic unit“. That is indeed a problem on my side. Thanks for checking. What confuse me was the not at all helpful “compilation of adacl-assert-integer.ads failed” and the proper error message is no where to be seen. But is probably an Alire problem. I'll close the bug.
[Bug middle-end/94442] [11/12/13/14 regression] Redundant loads/stores emitted at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94442 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Target Milestone|11.5|11.0 Resolution|--- |FIXED --- Comment #13 from Andrew Pinski --- Fixed by r11-6794-g04b472ad0e1dc93abafe .
[Bug target/95958] [meta-bug] Inefficient arm_neon.h code for AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 Bug 95958 depends on bug 94442, which changed state. Bug 94442 Summary: [11/12/13/14 regression] Redundant loads/stores emitted at -O3 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94442 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 95084, which changed state. Bug 95084 Summary: [11/12/13/14 Regression] code sinking prevents if-conversion https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95084 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE
[Bug tree-optimization/95084] [11/12/13/14 Regression] code sinking prevents if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95084 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #6 from Andrew Pinski --- This was fixed by the patch which fixed PR 92335 and since that is still open as a regression like this one I am going to close this one as a dup of bug 92335 and they are exactly the same issue even. *** This bug has been marked as a duplicate of bug 92335 ***
[Bug tree-optimization/92335] [11/12/13 Regression] sinking of loads happen too early which causes vectorization not to be done
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335 --- Comment #10 from Andrew Pinski --- *** Bug 95084 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/110903] New: [14 Regression] Dead Code Elimination Regression since r14-1597-g64d90d06d2d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110903 Bug ID: 110903 Summary: [14 Regression] Dead Code Elimination Regression since r14-1597-g64d90d06d2d Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: theodort at inf dot ethz.ch Target Milestone: --- https://godbolt.org/z/7of4jjM3K Given the following code: void foo(void); static char b, c; static short e, f; static int g = 41317; static int(a)(int h, int i) { return h + i; } static int(d)(int h, int i) { return i ? h : 0; } int main() { { char j; short k; for (; g >= 10; g = (short)g) { int l = 1, m = 0; j = 8 * k; k = j <= 0; f = c + 3; for (; c < 2; c = f) { char n = 4073709551615; if (!(((m) >= 0) && ((m) <= 0))) { __builtin_unreachable(); } if (g) ; else { if ((m = k, (b = a(d(l, k), e) && n) || l) < k) foo(); e = l = 0; } } } } } gcc-trunk -O3 does not eliminate the call to foo: main: movlg(%rip), %edi cmpl$9, %edi jle .L25 pushq %rbp movl%edi, %ecx movl$1, %ebp movl$1, %esi pushq %rbx movl$1, %ebx subq$8, %rsp movzbl c(%rip), %edx movsbw %dl, %ax addl$3, %eax movw%ax, f(%rip) cmpb$1, %dl jg .L12 .p2align 4,,10 .p2align 3 .L6: testl %edi, %edi je .L7 movb%al, c(%rip) movsbw %al, %dx cmpb$1, %al jle .L6 .L9: movswl %di, %ecx movl%ecx, g(%rip) cmpl$9, %ecx jle .L17 addl$3, %edx movw%dx, f(%rip) .L12: movswl %cx, %eax cmpw$9, %cx jle .L29 .L4: jmp .L4 .p2align 4,,10 .p2align 3 .L7: movswl e(%rip), %ecx movl%ebx, %edx andl%esi, %edx addl%ecx, %edx orl %esi, %edx jne .L10 testb %bpl, %bpl jne .L30 .L10: xorl%edx, %edx movb%al, c(%rip) movw%dx, e(%rip) movsbw %al, %dx cmpb$1, %al jg .L9 xorl%esi, %esi jmp .L6 .p2align 4,,10 .p2align 3 .L30: callfoo movzwl f(%rip), %eax movlg(%rip), %edi jmp .L10 .L29: movl%eax, g(%rip) .L17: addq$8, %rsp xorl%eax, %eax popq%rbx popq%rbp ret .L25: xorl%eax, %eax ret gcc-13.2.0 -O3 eliminates the call to foo: main: movlg(%rip), %esi movl%esi, %ecx cmpl$9, %esi jle .L14 movzbl c(%rip), %eax movsbw %al, %dx addl$3, %edx movw%dx, f(%rip) cmpb$1, %al jg .L12 xorl%eax, %eax testb %al, %al movl%edx, %eax je .L6 cmpb$1, %dl jg .L22 .L7: jmp .L7 .p2align 4,,10 .p2align 3 .L22: movb%dl, c(%rip) .L8: movswl %si, %ecx movl%ecx, g(%rip) cmpl$9, %ecx jle .L14 addl$3, %eax cbtw movw%ax, f(%rip) .L12: movswl %cx, %eax cmpw$9, %cx jle .L23 .L4: jmp .L4 .p2align 4,,10 .p2align 3 .L6: movb%dl, c(%rip) cmpw$1, %dx jg .L8 .p2align 4,,10 .p2align 3 .L9: movlg(%rip), %eax testl %eax, %eax jne .L9 movw$0, e(%rip) movb%dl, c(%rip) .L23: movl%eax, g(%rip) .L14: xorl%eax, %eax ret Bisects to r14-1597-g64d90d06d2d
[Bug tree-optimization/110903] [14 Regression] Dead Code Elimination Regression since r14-1597-g64d90d06d2d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110903 --- Comment #1 from Andrew Pinski --- Note the original testcase has some obvious use of an uninitialized variable. Anyways here is a fixed up testcase which does not have that uninitialized variable and GCC 13 was able to optimize away the call to foo still: ``` void foo(void); static signed char b, c; static short e, f; static int g = 41317; static int(a)(int h, int i) { return h + i; } static int(d)(int h, int i) { return i ? h : 0; } short t = 10; int main() { { signed char j; short k = t; for (; g >= 10; g = (short)g) { _Bool l = 1; int m = 0; j = 8 * k; k = j <= 0; f = c + 3; for (; c < 2; c = f) { signed char n = 4073709551615; if (!(((m) >= 0) && ((m) <= 0))) { __builtin_unreachable(); } if (g) ; else { if ((m = k, (b = a(d(l, k), e) && n) || l) < k) foo(); e = l = 0; } } } } } ```
[Bug middle-end/110857] aarch64-linux-gnu profiledbootstrap broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110857 --- Comment #6 from prathamesh3492 at gcc dot gnu.org --- profiledbootstrap now works on aarch64-linux-gnu, thanks!
[Bug c++/110904] New: __is_convertible incorrectly reports non-referenceable function prototypes as convertible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110904 Bug ID: 110904 Summary: __is_convertible incorrectly reports non-referenceable function prototypes as convertible Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: nikolasklauser at berlin dot de Target Milestone: --- ``` #include using Function = void(); using ConstFunction = void() const; static_assert((!std::is_convertible::value), ""); static_assert((!std::is_convertible::value), ""); // convertible static_assert((!std::is_convertible::value), ""); // convertible static_assert((!std::is_convertible::value), ""); // convertible static_assert((!std::is_convertible::value), ""); static_assert((!std::is_convertible::value), ""); static_assert((!std::is_convertible::value), ""); static_assert((!std::is_convertible::value), ""); ``` __is_convertible() claims that the cases marked above are convertible, but AFAICT that shouldn't be true. According to the standard, ``` To test() { return declval(); } ``` has to be well formed, but that's never the case for `ConstFunction`.
[Bug c++/110904] __is_convertible incorrectly reports non-referenceable function prototypes as convertible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110904 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- Dup of bug 109680. *** This bug has been marked as a duplicate of bug 109680 ***
[Bug c++/109680] [13 Regression] is_convertible incorrectly true
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109680 Andrew Pinski changed: What|Removed |Added CC||nikolasklauser at berlin dot de --- Comment #14 from Andrew Pinski --- *** Bug 110904 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/110903] [12/13/14 Regression] Dead Code Elimination Regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110903 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-08-04 Status|UNCONFIRMED |NEW Target Milestone|--- |12.4 Ever confirmed|0 |1 Keywords||needs-bisection Summary|[14 Regression] Dead Code |[12/13/14 Regression] Dead |Elimination Regression |Code Elimination Regression |since r14-1597-g64d90d06d2d | --- Comment #2 from Andrew Pinski --- Confirmed. before r14-1597, there was a jump threading happening with respect to: if (j_32 <= 0) goto ; [50.00%] else goto ; [50.00%] [local count: 238907556]: [local count: 477815112]: # iftmp.10_37 = PHI <_11(7), 0(8)> But after, we change that into iftmp.10_37 = _11 & (j_32 <= 0); It just happens we depend on that due to: _43 = l_22 | _25; _39 = j_32 <= 0; _12 = ~_43; _44 = _12 & _39; If we change the code to be: ``` void foo(void); static signed char b, c; static short e, f; static int g = 41317; static int(a)(int h, int i) { return h + i; } static int(d)(int h, int i) { return i & h;}//i ? h : 0; } short t = 10; int main() { { signed char j; short k = t; for (; g >= 10; g = (short)g) { _Bool l = 1; int m = 0; j = 8 * k; k = j <= 0; f = c + 3; for (; c < 2; c = f) { signed char n = 4073709551615; if (!(((m) >= 0) && ((m) <= 0))) { __builtin_unreachable(); } if (g) ; else { if ((m = k, (b = a(d(l, k), e) && n) || l) < k) foo(); e = l = 0; } } } } } ``` GCC 11 is able to remove the call to foo but GCC 12 cannot. the IR for the part where the phiopt2 changes on the trunk is similar enough. So this is instead a regression from GCC 11.
[Bug c++/110905] New: GCC rejects constexpr code that may re-initialize union member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905 Bug ID: 110905 Summary: GCC rejects constexpr code that may re-initialize union member Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: danakj at orodu dot net Target Milestone: --- Godbolt: https://gcc.godbolt.org/z/v5anxqnP1 This repro contains a std::optional (which has a union) and it sets the union in a loop. Doing so causes GCC to reject the code as not being a constant expression. The error I was getting in my project was far more descriptive, with it trying to call the deleted constructor of the union. error: use of deleted function ‘sus::option::__private::Storage, false>()’ In my more minimal test case the error is more terse and less clear. :62:59: error: non-constant condition for static assertion 62 | static_assert(Flatten({{1, 2, 3}, {}, {4, 5}}).sum() == 1 + 2 + 3 + 4 + 5); | ^~~~ :62:51: error: '(((const std::vector >*)(&)) != 0)' is not a constant expression 62 | static_assert(Flatten({{1, 2, 3}, {}, {4, 5}}).sum() == 1 + 2 + 3 + 4 + 5); | ```cpp #include #include template struct VectorIter { constexpr std::optional next() { if (front == back) return std::optional(); T& item = v[front]; front += 1u; return std::optional(std::move(item)); } constexpr VectorIter(std::vector v2) : v(std::move(v2)), front(0u), back(v.size()) {} VectorIter(VectorIter&&) = default; VectorIter& operator=(VectorIter&&) = default; std::vector v; size_t front; size_t back; }; template struct Flatten { constexpr Flatten(std::vector> v) : vec(std::move(v)) {} constexpr std::optional next() { std::optional out; while (true) { // Take an item off front_iter_ if possible. if (front_iter_.has_value()) { out = front_iter_.value().next(); if (out.has_value()) return out; front_iter_ = std::nullopt; } // Otherwise grab the next vector into front_iter_. if (!vec.empty()) { std::vector v = std::move(vec[0]); vec.erase(vec.begin()); front_iter_.emplace([](auto&& iter) { return VectorIter(std::move(iter)); }(std::move(v))); } if (!front_iter_.has_value()) break; } return out; } constexpr T sum() && { T out = T(); while (true) { std::optional i = next(); if (!i.has_value()) break; out += *i; } return out; } std::vector> vec; std::optional> front_iter_; }; static_assert(Flatten({{1, 2, 3}, {}, {4, 5}}).sum() == 1 + 2 + 3 + 4 + 5); int main() {} ``` When the Flatten::next() method is simplified a bit, so that it can see the union is only initialized once, the GCC compiler no longer rejects the code. https://gcc.godbolt.org/z/szfGsdxb7 ```cpp #include #include template struct VectorIter { constexpr std::optional next() { if (front == back) return std::optional(); T& item = v[front]; front += 1u; return std::optional(std::move(item)); } constexpr VectorIter(std::vector v2) : v(std::move(v2)), front(0u), back(v.size()) {} VectorIter(VectorIter&&) = default; VectorIter& operator=(VectorIter&&) = default; std::vector v; size_t front; size_t back; }; template struct Flatten { constexpr Flatten(std::vector v) : vec(std::move(v)) {} constexpr std::optional next() { std::optional out; while (true) { // Take an item off front_iter_ if possible. if (front_iter_.has_value()) { out = front_iter_.value().next(); if (out.has_value()) return out; front_iter_ = std::nullopt; } // Otherwise grab the next vector into front_iter_. if (!moved) { std::vector v = std::move(vec); moved = true; front_iter_.emplace([](auto&& iter) { return VectorIter(std::move(iter)); }(std::move(v))); } if (!front_iter_.has_value()) break; } return out; } constexpr T sum() && { T out = T(); while (true) { std::optional i = next(); if (!i.has_value()) break; out += *i; } return out; } bool moved = false; std::vector vec; std::optional> front_iter_; }; static_assert(Flatten({1, 2, 3}).sum() == 1 + 2 + 3); int main() {} ``` Yet in the first example, the GCC com
[Bug c++/110905] GCC rejects constexpr code that may re-initialize union member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-08-04 Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- >In my more minimal test case the error is more terse and less clear. The reduced testcase is a different issue and is a dup of bug 85944. In the first testcase provided below if we move the static_assert into main instead of the toplevel, it gets accepted. I think you need to redo your reduction.
[Bug other/109910] GCC prologue/epilogue saves/restores callee-saved registers that are never changed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109910 Georg-Johann Lay changed: What|Removed |Added Last reconfirmed||2023-08-04 Status|UNCONFIRMED |NEW Ever confirmed|0 |1
[Bug tree-optimization/32806] Missing optimization to remove backward dependencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32806 --- Comment #2 from Andrew Pinski --- Created attachment 55689 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55689&action=edit compilable testcase
[Bug tree-optimization/30049] Variable-length arrays (VLA) should be converted to normal arrays if possible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30049 --- Comment #2 from Andrew Pinski --- Created attachment 55690 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55690&action=edit testcase [apinski@xeond2 upstream-gcc-git]$ ~/upstream-gcc/bin/gcc t.c -march=opteron -ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 -g [apinski@xeond2 upstream-gcc-git]$ time ./a.out real0m1.522s user0m1.517s sys 0m0.001s [apinski@xeond2 upstream-gcc-git]$ ~/upstream-gcc/bin/gcc t.c -march=opteron -ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 -g -DNORMAL_ARRAY [apinski@xeond2 upstream-gcc-git]$ time ./a.out real0m0.356s user0m0.352s sys 0m0.002s
[Bug tree-optimization/30049] Variable-length arrays (VLA) should be converted to normal arrays if possible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30049 --- Comment #3 from Andrew Pinski --- The only difference I saw is scheduling and some small IV-OPTs difference ...
[Bug tree-optimization/35224] scalar evolution analysis fails with "evolution of base is not affine"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35224 --- Comment #1 from Andrew Pinski --- Created attachment 55691 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55691&action=edit testcase
[Bug tree-optimization/49955] Fails to do partial basic-block SLP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955 --- Comment #4 from Andrew Pinski --- The testcase in comment #0 started to be vectorized in GCC 13
[Bug tree-optimization/18437] vectorizer failed for matrix multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437 --- Comment #9 from Andrew Pinski --- For the original testcase in comment #0, with `-O3 -fno-vect-cost-model` GCC can vectorize it on aarch64 but not on x86_64.
[Bug analyzer/110426] Missing buffer overflow warning with function pointer that has the alloc_size attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110426 --- Comment #2 from CVS Commits --- The master branch has been updated by David Malcolm : https://gcc.gnu.org/g:021077b94741c9300dfff3a24e95b3ffa3f508a7 commit r14-3001-g021077b94741c9300dfff3a24e95b3ffa3f508a7 Author: David Malcolm Date: Fri Aug 4 16:18:40 2023 -0400 analyzer: handle function attribute "alloc_size" [PR110426] This patch makes -fanalyzer make use of the function attribute "alloc_size", allowing -fanalyzer to emit -Wanalyzer-allocation-size, -Wanalyzer-out-of-bounds, and -Wanalyzer-tainted-allocation-size on execution paths involving allocations using such functions. gcc/analyzer/ChangeLog: PR analyzer/110426 * bounds-checking.cc (region_model::check_region_bounds): Handle symbolic base regions. * call-details.cc: Include "stringpool.h" and "attribs.h". (call_details::lookup_function_attribute): New function. * call-details.h (call_details::lookup_function_attribute): New function decl. * region-model-manager.cc (region_model_manager::maybe_fold_binop): Add reference to PR analyzer/110902. * region-model-reachability.cc (reachable_regions::handle_sval): Add symbolic regions for pointers that are conjured svalues for the LHS of a stmt. * region-model.cc (region_model::canonicalize): Purge dynamic extents for regions that aren't referenced. (get_result_size_in_bytes): New function. (region_model::on_call_pre): Use get_result_size_in_bytes and potentially set the dynamic extents of the region pointed to by the return value. (region_model::deref_rvalue): Add param "add_nonnull_constraint" and use it to conditionalize adding the constraint. (pending_diagnostic_subclass::dubious_allocation_size): Add "stmt" param to both ctors and use it to initialize new "m_stmt" field. (pending_diagnostic_subclass::operator==): Use m_stmt; don't use m_lhs or m_rhs. (pending_diagnostic_subclass::m_stmt): New field. (region_model::check_region_size): Generalize to any kind of pointer svalue by using deref_rvalue rather than checking for region_svalue. Pass stmt to dubious_allocation_size ctor. * region-model.h (region_model::deref_rvalue): Add param "add_nonnull_constraint". * svalue.cc (conjured_svalue::lhs_value_p): New function. * svalue.h (conjured_svalue::lhs_value_p): New decl. gcc/testsuite/ChangeLog: PR analyzer/110426 * gcc.dg/analyzer/allocation-size-1.c: Update expected message to reflect consolidation of size and assignment into a single event. * gcc.dg/analyzer/allocation-size-2.c: Likewise. * gcc.dg/analyzer/allocation-size-3.c: Likewise. * gcc.dg/analyzer/allocation-size-4.c: Likewise. * gcc.dg/analyzer/allocation-size-multiline-1.c: Likewise. * gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise. * gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise. * gcc.dg/analyzer/attr-alloc_size-1.c: New test. * gcc.dg/analyzer/attr-alloc_size-2.c: New test. * gcc.dg/analyzer/attr-alloc_size-3.c: New test. * gcc.dg/analyzer/explode-4.c: New test. * gcc.dg/analyzer/taint-size-1.c: Add test coverage for __attribute__ alloc_size. Signed-off-by: David Malcolm
[Bug analyzer/110902] Missing cast in region_model_manager::maybe_fold_binop on MULT_EXPR by 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110902 --- Comment #1 from CVS Commits --- The master branch has been updated by David Malcolm : https://gcc.gnu.org/g:021077b94741c9300dfff3a24e95b3ffa3f508a7 commit r14-3001-g021077b94741c9300dfff3a24e95b3ffa3f508a7 Author: David Malcolm Date: Fri Aug 4 16:18:40 2023 -0400 analyzer: handle function attribute "alloc_size" [PR110426] This patch makes -fanalyzer make use of the function attribute "alloc_size", allowing -fanalyzer to emit -Wanalyzer-allocation-size, -Wanalyzer-out-of-bounds, and -Wanalyzer-tainted-allocation-size on execution paths involving allocations using such functions. gcc/analyzer/ChangeLog: PR analyzer/110426 * bounds-checking.cc (region_model::check_region_bounds): Handle symbolic base regions. * call-details.cc: Include "stringpool.h" and "attribs.h". (call_details::lookup_function_attribute): New function. * call-details.h (call_details::lookup_function_attribute): New function decl. * region-model-manager.cc (region_model_manager::maybe_fold_binop): Add reference to PR analyzer/110902. * region-model-reachability.cc (reachable_regions::handle_sval): Add symbolic regions for pointers that are conjured svalues for the LHS of a stmt. * region-model.cc (region_model::canonicalize): Purge dynamic extents for regions that aren't referenced. (get_result_size_in_bytes): New function. (region_model::on_call_pre): Use get_result_size_in_bytes and potentially set the dynamic extents of the region pointed to by the return value. (region_model::deref_rvalue): Add param "add_nonnull_constraint" and use it to conditionalize adding the constraint. (pending_diagnostic_subclass::dubious_allocation_size): Add "stmt" param to both ctors and use it to initialize new "m_stmt" field. (pending_diagnostic_subclass::operator==): Use m_stmt; don't use m_lhs or m_rhs. (pending_diagnostic_subclass::m_stmt): New field. (region_model::check_region_size): Generalize to any kind of pointer svalue by using deref_rvalue rather than checking for region_svalue. Pass stmt to dubious_allocation_size ctor. * region-model.h (region_model::deref_rvalue): Add param "add_nonnull_constraint". * svalue.cc (conjured_svalue::lhs_value_p): New function. * svalue.h (conjured_svalue::lhs_value_p): New decl. gcc/testsuite/ChangeLog: PR analyzer/110426 * gcc.dg/analyzer/allocation-size-1.c: Update expected message to reflect consolidation of size and assignment into a single event. * gcc.dg/analyzer/allocation-size-2.c: Likewise. * gcc.dg/analyzer/allocation-size-3.c: Likewise. * gcc.dg/analyzer/allocation-size-4.c: Likewise. * gcc.dg/analyzer/allocation-size-multiline-1.c: Likewise. * gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise. * gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise. * gcc.dg/analyzer/attr-alloc_size-1.c: New test. * gcc.dg/analyzer/attr-alloc_size-2.c: New test. * gcc.dg/analyzer/attr-alloc_size-3.c: New test. * gcc.dg/analyzer/explode-4.c: New test. * gcc.dg/analyzer/taint-size-1.c: Add test coverage for __attribute__ alloc_size. Signed-off-by: David Malcolm
[Bug tree-optimization/18437] vectorizer failed for matrix multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437 --- Comment #10 from Andrew Pinski --- (In reply to Andrew Pinski from comment #9) > For the original testcase in comment #0, with `-O3 -fno-vect-cost-model` GCC > can vectorize it on aarch64 but not on x86_64. I should say starting in GCC 6 .
[Bug analyzer/110426] Missing buffer overflow warning with function pointer that has the alloc_size attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110426 David Malcolm changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from David Malcolm --- Should be implemented for gcc 14 by the above patch.
[Bug tree-optimization/21998] (cond ? result1 : result2) is vectorized, where equivalent if-syntax isn't (store)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21998 --- Comment #7 from Andrew Pinski --- We can vectorize test2 using mask stores
[Bug c++/110905] GCC rejects constexpr code that may re-initialize union member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905 --- Comment #2 from danakj at orodu dot net --- Ah ok. Here's a big reproduction: https://godbolt.org/z/Kj7Tcd6P4 /opt/compiler-explorer/gcc-trunk-20230804/include/c++/14.0.0/bits/stl_construct.h:97:14: in 'constexpr' expansion of '((sus::containers::VecIntoIter*))->sus::containers::VecIntoIter::VecIntoIter((* & std::forward >((* & __args#0' :32895:22: error: use of deleted function 'sus::option::__private::Storage, false>()' 32895 | struct [[nodiscard]] VecIntoIter final | ^~~ :3015:9: note: 'sus::option::__private::Storage, false>()' is implicitly deleted because the default definition would be ill-formed: 3015 | union { | ^ :3015:9: error: no matching function for call to 'sus::containers::VecIntoIter::VecIntoIter()' :32953:13: note: candidate: 'constexpr sus::containers::VecIntoIter::VecIntoIter(sus::containers::Vec&&, sus::num::usize, sus::num::usize) [with ItemT = sus::num::i32]' 32953 | constexpr VecIntoIter(Vec&& vec, usize front, usize back) noexcept | ^~~ :32953:13: note: candidate expects 3 arguments, 0 provided :32951:13: note: candidate: 'constexpr sus::containers::VecIntoIter::VecIntoIter(sus::containers::Vec&&) [with ItemT = sus::num::i32]' 32951 | constexpr VecIntoIter(Vec&& vec) noexcept : vec_(::sus::move(vec)) {} | ^~~ :32951:13: note: candidate expects 1 argument, 0 provided :32895:22: note: candidate: 'constexpr sus::containers::VecIntoIter::VecIntoIter(sus::containers::VecIntoIter&&)' 32895 | struct [[nodiscard]] VecIntoIter final | ^~~ :32895:22: note: candidate expects 1 argument, 0 provided Compiler returned: 1 I will try to shrink it now.
[Bug middle-end/110906] New: __attribute__((optimize("no-math-errno"))) has no effect.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906 Bug ID: 110906 Summary: __attribute__((optimize("no-math-errno"))) has no effect. Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider this C++ code compiled with -O3: double g(double x) { return std::sqrt(x); } Usually this does call the library function std::sqrt because x might be negative and errno needs to be set accordingly. Moreover, with -fno-math-errno a single sqrtsd instruction is emitted. However, annotating g with __attribute__((optimize("no-math-errno"))) has no effect. This attribute (and #pragma GCC optimize("no-math-errno") ) used to work up to gcc 5.5. https://godbolt.org/z/T1nb11bv5
[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906 --- Comment #1 from Andrew Pinski --- well std::sqrt is not annotated with no-math-errno after all ...
[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906 --- Comment #2 from Andrew Pinski --- Created attachment 55692 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55692&action=edit Full testcase
[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906 --- Comment #3 from Andrew Pinski --- But even: ``` __attribute__((optimize("no-math-errno"))) double g(double x) { return __builtin_sqrt(x); } ``` Does not change here ...