[Bug tree-optimization/88970] ICE: verify_ssa failed (error: definition in block 2 follows the use)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88970 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Confirmed, it's very old (at least 4.9.0).
[Bug c++/88967] [9 regression] openmp default(none) broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88967 --- Comment #4 from Roman Lebedev --- (In reply to Roman Lebedev from comment #3) > While there, any advice on how that is supposed to be rewritten? > Simply adding "shared(begin, len)" makes older gcc's unhappy > https://godbolt.org/z/gyZBR- > Only keeping "shared(begin, len)" (and dropping "default(none)") does not > work either. Right, "firstprivate(begin, len)" works while being backward compatible, sorry for panicking too early. https://godbolt.org/z/tEYKIq
[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2019-01-22 CC||marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Can't reproduce with current trunk, I only see: g++ pr88969.cpp -c -std=c++2a -fchecking=1 pr88969.cpp:13:21: error: expected initializer at end of input 13 | void delete_B(B *b) | ^ pr88969.cpp:13:21: error: expected ‘}’ at end of input pr88969.cpp:8:28: note: to match this ‘{’ 8 | namespace delete_selection { |^
[Bug c/88968] [8/9 Regression] Stack overflow in gimplify_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88968 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||jakub at gcc dot gnu.org, ||marxin at gcc dot gnu.org Target Milestone|--- |8.3 Ever confirmed|0 |1 Known to fail||8.2.0, 9.0 --- Comment #1 from Martin Liška --- Confirmed, it's rejected with GCC 7.4.0: pr88968.c: In function ‘yp’: pr88968.c:9:9: error: cannot take address of bit-field ‘hq’ #pragma omp atomic ^~~ so started with r250929.
[Bug preprocessor/88966] Indirect stringification of "linux" produces "1"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88966 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||marxin at gcc dot gnu.org Component|c |preprocessor Ever confirmed|0 |1 --- Comment #3 from Martin Liška --- Confirmed, happens will all releases I have (4.8.0+).
[Bug target/88965] powerpc64le vector builtin hits ICE in verify_gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88965 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2019-01-22 CC||marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Martin Liška --- Can you please provide full command line you use for the compilation? I can't reproduce for the snippet for a cross-compiler.
[Bug target/88965] powerpc64le vector builtin hits ICE in verify_gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88965 Jakub Jelinek changed: What|Removed |Added Status|WAITING |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Created attachment 45488 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45488&action=edit gcc9-pr88965.patch Untested fix.
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||amker at gcc dot gnu.org, ||marxin at gcc dot gnu.org Target Milestone|--- |8.3 Ever confirmed|0 |1 --- Comment #2 from Martin Liška --- Anyway, started with r255472.
[Bug d/88958] ICE in walk_aliased_vdefs_1, at tree-ssa-alias.c:2887
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88958 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||ibuclaw at gcc dot gnu.org, ||marxin at gcc dot gnu.org Ever confirmed|0 |1
[Bug d/88957] ICE: Segmentation fault in tree_could_trap_p, at tree-eh.c:2672
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88957 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||ibuclaw at gcc dot gnu.org, ||marxin at gcc dot gnu.org Ever confirmed|0 |1
[Bug c/88956] [9 Regression] ICE: Floating point exception
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88956 Martin Liška changed: What|Removed |Added Priority|P3 |P1 Status|NEW |ASSIGNED CC||marxin at gcc dot gnu.org Known to work||8.2.0 Assignee|unassigned at gcc dot gnu.org |msebor at gcc dot gnu.org Known to fail||9.0 --- Comment #2 from Martin Liška --- Started with r262522.
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #21 from rguenther at suse dot de --- On Tue, 22 Jan 2019, elrodc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 > > --- Comment #19 from Chris Elrod --- > To add a little more: > I used inline asm for direct access to the rsqrt instruction "vrsqrt14ps" in > Julia. Without adding a Newton step, the answers are wrong beyond just a > couple > significant digits. > With the Newton step, the answers are correct. > > My point is that LLVM-compiled code (Clang/Flang/ispc) are definitely adding > the Newton step. They get the correct answer. > > That leaves my best guess for the performance difference as owing to the > masked > "vrsqrt14ps" that gcc is using: > > vcmpps $4, %zmm0, %zmm5, %k1 > vrsqrt14ps %zmm0, %zmm1{%k1}{z} > > Is there any way for me to test that idea? > Edit the asm to remove the vcmppss and mask, compile the asm with gcc, and > benchmark it? Usually it's easiest to compile to assembler with GCC (-S) and test this kind of theories by editing the GCC generated assembly and then benchmark that. Just use the assembler as input to the gfortran compile command instead of the .f for linking the program.
[Bug preprocessor/88966] Indirect stringification of "linux" produces "1"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88966 --- Comment #4 from Andrew Pinski --- The same reason why: #define mymacro 1 str(mymacro) stringify(mymacro) Gives different results.
[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969 --- Comment #3 from Arseny Solokha --- --- mi9qy2yt.cpp2019-01-22 15:51:33.410845340 +0700 +++ tbfkgb7c.cpp2019-01-22 15:51:28.620898102 +0700 @@ -7,7 +7,7 @@ namespace delete_selection { struct B { void operator delete(void*) = delete; -void operator delete(B *, std::destroying_delete_t) = delete; +void operator delete(void *, std::destroying_delete_t) = delete; }; void delete_B(B *b) { delete b; } } % g++-9.0.0-alpha20190120 -c tbfkgb7c.cpp tbfkgb7c.cpp:10:62: internal compiler error: Segmentation fault 10 | void operator delete(void *, std::destroying_delete_t) = delete; | ^~ 0xf9cb6f crash_signal /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/toplev.c:326 0xa5ad2f tree_class_check(tree_node*, tree_code_class, char const*, int, char const*) /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/tree.h:3298 0xa5ad2f comptypes(tree_node*, tree_node*, int) /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/typeck.c:1465 0x926cc7 coerce_delete_type(tree_node*, unsigned int) /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl2.c:1776 0x8ff2ba grok_op_properties(tree_node*, bool) /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl.c:13472 0x90c3ac grokfndecl /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl.c:9034 0x916b60 grokdeclarator(cp_declarator const*, cp_decl_specifier_seq*, decl_context, int, tree_node**) /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl.c:12424 0x92a54e grokfield(cp_declarator const*, cp_decl_specifier_seq*, tree_node*, bool, tree_node*, tree_node*) /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl2.c:814 0x9c13cf cp_parser_member_declaration /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:24656 0x999f9f cp_parser_member_specification_opt /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:24129 0x999f9f cp_parser_class_specifier_1 /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:23273 0x99bc98 cp_parser_class_specifier /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:23535 0x99bc98 cp_parser_type_specifier /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:17356 0x99cc50 cp_parser_decl_specifier_seq /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:14049 0x99d424 cp_parser_simple_declaration /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13354 0x9c2cdd cp_parser_declaration /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13173 0x9c389c cp_parser_declaration_seq_opt /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13049 0x9c389c cp_parser_namespace_body /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:19252 0x9c389c cp_parser_namespace_definition /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:19230 0x9c2df0 cp_parser_declaration /var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13153
[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969 --- Comment #2 from Arseny Solokha --- I get what you see when I modify the testcase from comment 0 the following way: --- mi9qy2yt.cpp2019-01-22 15:48:53.473604944 +0700 +++ r9d6mwt2.cpp2019-01-22 15:46:45.567008369 +0700 @@ -1,3 +1,4 @@ + namespace std { struct destroying_delete_t { struct __construct { explicit __construct() = default; }; @@ -9,5 +10,5 @@ void operator delete(void*) = delete; void operator delete(B *, std::destroying_delete_t) = delete; }; - void delete_B(B *b) { delete b; } -} + void delete_B(B *b) + Looks like a copy-paste error?
[Bug fortran/35476] Accepts invalid: USE/host association of generics with same specifics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35476 Jürgen Reuter changed: What|Removed |Added CC||juergen.reuter at desy dot de --- Comment #9 from Jürgen Reuter --- This is still present and not caught by gfortran, according to the interp from J3 the code is invalid.
[Bug gcov-profile/88924] [GCOV] Wrong frequencies when there is complicated if expressions in gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88924 Martin Liška changed: What|Removed |Added Priority|P3 |P5 Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Confirmed, it's related to some subexpression folding. Thus low priority to fix.
[Bug fortran/35779] error pointer wrong in PARAMETER
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35779 Jürgen Reuter changed: What|Removed |Added CC||juergen.reuter at desy dot de --- Comment #13 from Jürgen Reuter --- Still present in trunk.
[Bug gcov-profile/88913] [GCOV] Wrong frequencies when a global variable is in a while expression in gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88913 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #1 from Martin Liška --- Fixed on trunk in r247374.
[Bug gcov-profile/88913] [GCOV] Wrong frequencies when a global variable is in a while expression in gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88913 --- Comment #2 from Yibiao Yang --- (In reply to Martin Liška from comment #1) > Fixed on trunk in r247374. Thanks.
[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969 Martin Liška changed: What|Removed |Added Status|WAITING |NEW CC||jason at gcc dot gnu.org --- Comment #4 from Martin Liška --- Confirmed, started with r266053.
[Bug rtl-optimization/49429] [4.7 Regression] dse.c change (r175063) causes execution failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49429 --- Comment #21 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:10:25 2019 New Revision: 268138 URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev Log: PR rtl-optimization/49429 PR target/49454 PR rtl-optimization/86334 PR target/88906 * expr.c (emit_block_move_hints): Move marking of MEM_EXPRs addressable from here... (emit_block_op_via_libcall): ... to here. * gcc.target/i386/pr86334.c: New test. * gcc.target/i386/pr88906.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr86334.c trunk/gcc/testsuite/gcc.target/i386/pr88906.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/86334] wrong code with -march=athlon -mmemcpy-strategy=libcall:-1:noalign
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86334 --- Comment #4 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:10:25 2019 New Revision: 268138 URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev Log: PR rtl-optimization/49429 PR target/49454 PR rtl-optimization/86334 PR target/88906 * expr.c (emit_block_move_hints): Move marking of MEM_EXPRs addressable from here... (emit_block_op_via_libcall): ... to here. * gcc.target/i386/pr86334.c: New test. * gcc.target/i386/pr88906.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr86334.c trunk/gcc/testsuite/gcc.target/i386/pr88906.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug target/49454] [4.7 Regression] /usr/include/libio.h:336:3: internal compiler error: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49454 --- Comment #9 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:10:25 2019 New Revision: 268138 URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev Log: PR rtl-optimization/49429 PR target/49454 PR rtl-optimization/86334 PR target/88906 * expr.c (emit_block_move_hints): Move marking of MEM_EXPRs addressable from here... (emit_block_op_via_libcall): ... to here. * gcc.target/i386/pr86334.c: New test. * gcc.target/i386/pr88906.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr86334.c trunk/gcc/testsuite/gcc.target/i386/pr88906.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug target/88906] wrong code with -march=k6 -minline-all-stringops -minline-stringops-dynamically -mmemcpy-strategy=libcall:-1:align and vector argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88906 --- Comment #8 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:10:25 2019 New Revision: 268138 URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev Log: PR rtl-optimization/49429 PR target/49454 PR rtl-optimization/86334 PR target/88906 * expr.c (emit_block_move_hints): Move marking of MEM_EXPRs addressable from here... (emit_block_op_via_libcall): ... to here. * gcc.target/i386/pr86334.c: New test. * gcc.target/i386/pr88906.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr86334.c trunk/gcc/testsuite/gcc.target/i386/pr88906.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug fortran/35718] deallocating non-allocated pointer target does not fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35718 --- Comment #6 from Jürgen Reuter --- Still present in trunk.
[Bug target/88905] [8/9 Regression] ICE: in decompose, at rtl.h:2253 with -mabm and __builtin_popcountll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88905 --- Comment #5 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:11:35 2019 New Revision: 268139 URL: https://gcc.gnu.org/viewcvs?rev=268139&root=gcc&view=rev Log: PR target/88905 * optabs.c (add_equal_note): Add op0_mode argument, use it instead of GET_MODE (op0). (expand_binop_directly, expand_doubleword_clz, expand_doubleword_popcount, expand_ctz, expand_ffs, expand_unop_direct, maybe_emit_unop_insn): Adjust callers. * gcc.dg/pr88905.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr88905.c Modified: trunk/gcc/ChangeLog trunk/gcc/optabs.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/88904] [9 Regression] Basic block incorrectly skipped in jump threading.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904 --- Comment #5 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:12:31 2019 New Revision: 268140 URL: https://gcc.gnu.org/viewcvs?rev=268140&root=gcc&view=rev Log: PR rtl-optimization/88904 * cfgcleanup.c (thread_jump): Verify cond2 doesn't mention any nonequal registers before processing BB_END (b). * gcc.c-torture/execute/pr88904.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr88904.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgcleanup.c trunk/gcc/testsuite/ChangeLog
[Bug target/88905] [8 Regression] ICE: in decompose, at rtl.h:2253 with -mabm and __builtin_popcountll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88905 Jakub Jelinek changed: What|Removed |Added Summary|[8/9 Regression] ICE: in|[8 Regression] ICE: in |decompose, at rtl.h:2253|decompose, at rtl.h:2253 |with -mabm and |with -mabm and |__builtin_popcountll|__builtin_popcountll --- Comment #6 from Jakub Jelinek --- Fixed on the trunk so far.
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #5 from Christopher Leonard --- Is the order at least consistant with x86-32? i.e. if you give a 64-bit input operand to inline assembly the order is hi:lo? I'm worried this is a bizarre convention imposed on high endian architectures.
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||jakub at gcc dot gnu.org Blocks||88670 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. The reason is that vector lowering only lowers the arithmetic but leaves the loads and stores alone: _1 = *b_5(D); _2 = *c_6(D); - _3 = _1 + _2; + _9 = BIT_FIELD_REF <_1, 128, 0>; + _10 = BIT_FIELD_REF <_2, 128, 0>; + _11 = _9 + _10; + _12 = BIT_FIELD_REF <_1, 128, 128>; + _13 = BIT_FIELD_REF <_2, 128, 128>; + _14 = _12 + _13; + _15 = BIT_FIELD_REF <_1, 128, 256>; + _16 = BIT_FIELD_REF <_2, 128, 256>; + _17 = _15 + _16; + _18 = BIT_FIELD_REF <_1, 128, 384>; + _19 = BIT_FIELD_REF <_2, 128, 384>; + _20 = _18 + _19; + _3 = {_11, _14, _17, _20}; *a_7(D) = _3; there's some hack^Wcode in tree-ssa-forwprop.c to deal with similar cases using {REAL,IMAG}PART_EXPR and COMPLEX_EXPR, splitting feeding/destination memory accesses. The same trick is missing for vector loads/stores. OTOH it would be more reasonable for vector lowering to split the loads. It's not so difficult to do - the main "issue" would be making sure the wide vector load goes away (or maybe that's even a secondary issue that could be ignored). With just the loads handled code generation improves to test: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 andq$-64, %rsp subq$8, %rsp movdqa (%rsi), %xmm3 movdqa 16(%rsi), %xmm2 movdqa 32(%rsi), %xmm1 movdqa 48(%rsi), %xmm0 paddd (%rdx), %xmm3 paddd 16(%rdx), %xmm2 paddd 32(%rdx), %xmm1 paddd 48(%rdx), %xmm0 movaps %xmm3, (%rdi) movaps %xmm2, 16(%rdi) movaps %xmm1, 32(%rdi) movaps %xmm0, 48(%rdi) leave ret .cfi_endproc for SSE2 and test: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 andq$-64, %rsp subq$8, %rsp vmovdqa (%rsi), %ymm3 vpaddd (%rdx), %ymm3, %ymm0 vmovdqa %xmm0, %xmm2 vmovdqa %ymm0, -120(%rsp) vmovdqa 32(%rsi), %ymm0 vmovdqa -104(%rsp), %xmm4 vpaddd 32(%rdx), %ymm0, %ymm0 vmovaps %xmm2, (%rdi) vmovdqa %ymm0, -88(%rsp) vmovdqa -72(%rsp), %xmm5 vmovaps %xmm4, 16(%rdi) vmovaps %xmm0, 32(%rdi) vmovaps %xmm0, 32(%rdi) vmovaps %xmm5, 48(%rdi) vzeroupper leave .cfi_def_cfa 7, 8 ret for skylake. Not sure why we spill anything with the above, with the SSE code we manage to elide the spills (but not the stack reservation). I guess we need to handle the stores as well. The odd thing is that if I simply do _12 = BIT_FIELD_REF <*b_5(D), 256, 256>; _9 = BIT_FIELD_REF <*b_5(D), 256, 0>; _13 = BIT_FIELD_REF <*c_6(D), 256, 256>; _10 = BIT_FIELD_REF <*c_6(D), 256, 0>; _11 = _9 + _10; _14 = _12 + _13; BIT_FIELD_REF <*a_7(D), 256, 0> = _11; BIT_FIELD_REF <*a_7(D), 256, 256> = _14; code-generation is even worse: vmovdqa (%rsi), %ymm0 vmovdqa 32(%rsi), %ymm2 vpaddd (%rdx), %ymm0, %ymm3 vpaddd 32(%rdx), %ymm2, %ymm1 vmovdqa %ymm3, -64(%rsp) movq-56(%rsp), %rax vmovdqa %ymm1, -32(%rsp) movq%rax, 8(%rdi) movq-48(%rsp), %rax vmovdqa -64(%rsp), %xmm0 movq%rax, 16(%rdi) movq-40(%rsp), %rax vmovq %xmm0, (%rdi) movq%rax, 24(%rdi) movq-24(%rsp), %rax vmovdqa -32(%rsp), %xmm0 movq%rax, 40(%rdi) movq-16(%rsp), %rax vmovq %xmm0, 32(%rdi) movq%rax, 48(%rdi) movq-8(%rsp), %rax movq%rax, 56(%rdi) vzeroupper leave .cfi_def_cfa 7, 8 ret the stores expand to ;; BIT_FIELD_REF <*a_7(D), 256, 0> = _11; (insn 14 13 15 (set (mem/j:DI (reg/v/f:DI 88 [ a ]) [1 *a_7(D)+0 S8 A256]) (subreg:DI (reg:V8SI 84 [ _11 ]) 0)) "t.c":6:6 -1 (nil)) (insn 15 14 16 (set (mem/j:DI (plus:DI (reg/v/f:DI 88 [ a ]) (const_int 8 [0x8])) [1 *a_7(D)+8 S8 A64]) (subreg:DI (reg:V8SI 84 [ _11 ]) 8)) "t.c":6:6 -1 (nil)) (insn 16 15 17 (set (mem/j:DI (plus:DI (reg/v/f:DI 88 [ a ]) (const_int 16 [0x10])) [1 *a_7(D)+16 S8 A128]) (subreg:DI (reg:V8SI 84 [ _11 ]) 16)) "t.c":6:6 -1 (nil)) (insn 17 16 0 (set (mem/j:DI (plus:DI (reg/v/f:DI 88
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #2 from Richard Biener --- Created attachment 45489 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45489&action=edit untested patch forwprop patch I was playing with.
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #3 from Jakub Jelinek --- Yeah, I've noticed that already when working on __builtin_convertvector, we don't really do much TER for the oversized vector SSA_NAMEs and force them into stack all the time. Wonder if we couldn't do kind of SRA for these vectors after generic vector lowering to split them into multiple unrelated SSA_NAMEs if possible.
[Bug c++/88951] [9 Regression] No fpermissive offerred on 'error: jump to case label'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88951 --- Comment #1 from Paolo Carlini --- The rationale for the change is here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00623.html I my experience, accepting such kind of code is really dangerous, because -fpermissive isn't fine grained thus in some cases users want to pass it to allow for other *safe* legacy constructs. That said, clarified that I vote NO, NO based on real, hard, experience, it's the front-end maintainers call, reverting the change would be trivial.
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #6 from Andrew Pinski --- (In reply to Christopher Leonard from comment #5) > Is the order at least consistant with x86-32? i.e. if you give a 64-bit > input operand to inline assembly the order is hi:lo? I'm worried this is a > bizarre convention imposed on high endian architectures. Yes the order is always hi:lo (reg:reg+1) on all targets I know of; endianness only matters when it comes to memory.
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #22 from Chris Elrod --- Okay. I did that, and the time went from about 4.25 microseconds down to 4.0 microseconds. So that is an improvement, but accounts for only a small part of the difference with the LLVM-compilers. -O3 -fno-math-errno was about 3.5 microseconds, so -funsafe-math-optimizations still results in a regression in this code. 3.5 microseconds is roughly as fast as you can get with vsqrt and div. My best guess now is that gcc does a lot more to improve the accuracy of vsqrt. If I understand correctly, these are all the involved instructions: vmovaps .LC2(%rip), %zmm7 vmovaps .LC3(%rip), %zmm6 # for loop begins vrsqrt14ps %zmm1, %zmm2 # comparison and mask removed vmulps %zmm1, %zmm2, %zmm0 vmulps %zmm2, %zmm0, %zmm1 vmulps %zmm6, %zmm0, %zmm0 vaddps %zmm7, %zmm1, %zmm1 vmulps %zmm0, %zmm1, %zmm1 vrcp14ps%zmm1, %zmm0 vmulps %zmm1, %zmm0, %zmm1 vmulps %zmm1, %zmm0, %zmm1 vaddps %zmm0, %zmm0, %zmm0 vsubps %zmm1, %zmm0, %zmm0 vfnmadd213ps(%r10,%rax), %zmm0, %zmm2 If I understand this correctly: zmm2 =(approx) 1 / sqrt(zmm1) zmm0 = zmm1 * zmm2 = (approx) sqrt(zmm1) zmm1 = zmm0 * zmm2 = (approx) 1 zmm0 = zmm6 * zmm0 = (approx) constant6 * sqrt(zmm1) zmm1 = zmm7 * zmm1 = (approx) constant7 zmm1 = zmm0 * zmm1 = (approx) constant6 * constant6 * sqrt(zmm1) zmm0 = (approx) 1 / zmm1 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 * constant7) zmm1 = zmm1 * zmm0 = (approx) 1 zmm1 = zmm1 * zmm0 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 * constant7) zmm0 = 2 * zmm0 = (approx) 2 / sqrt(zmm1) * 2 / (constant6 * constant7) zmm0 = zmm1 - zmm0 = (approx) -1 / sqrt(zmm1) * 1 / (constant6 * constant7) which implies that constant6 * constant6 = approximately -1? LLVM seems to do a much simpler / briefer update of the output of vrsqrt. When I implemented a vrsqrt intrinsic in a Julia library, I just looked at Wikipedia and did (roughly): constant1 = -0.5 constant2 = 1.5 zmm2 = (approx) 1 / sqrt(zmm1) zmm3 = constant * zmm1 zmm1 = zmm2 * zmm2 zmm3 = zmm3 * zmm1 + constant2 zmm2 = zmm2 * zmm3 I am not a numerical analyst, so I can't comment on relative validities or accuracies of these approaches. I also don't know what LLVM 7+ does. LLVM 6 doesn't use vrsqrt. I would be interesting in reading explanations or discussions, if any are available.
[Bug fortran/37222] [OOP] Checks when overriding type-bound procedures are incomplete
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37222 Jürgen Reuter changed: What|Removed |Added CC||juergen.reuter at desy dot de --- Comment #4 from Jürgen Reuter --- As Janus commented there is just one left-over (already fixed in the past six years?). So what is really left to do here?
[Bug fortran/37398] Statement functions mask missing PURE procedures.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37398 Jürgen Reuter changed: What|Removed |Added CC||juergen.reuter at desy dot de --- Comment #3 from Jürgen Reuter --- This correctly gives the expected error messages since at least gfortran 5.4. Closing as FIXED?
[Bug fortran/38113] on warning/error: skip whitespaces, move position marker to actual variable name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38113 Jürgen Reuter changed: What|Removed |Added CC||juergen.reuter at desy dot de --- Comment #9 from Jürgen Reuter --- Here there are some problems that have been fixed, and some new have been revealed!? To me it is not clear what the exact context is now. Maybe closing as WORKSFORME, and waiting for someone to open an actual issue with the alignment of markers?
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #23 from rguenther at suse dot de --- On Tue, 22 Jan 2019, elrodc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 > > --- Comment #22 from Chris Elrod --- > Okay. I did that, and the time went from about 4.25 microseconds down to 4.0 > microseconds. So that is an improvement, but accounts for only a small part of > the difference with the LLVM-compilers. > > -O3 -fno-math-errno > > was about 3.5 microseconds, so -funsafe-math-optimizations still results in a > regression in this code. > > 3.5 microseconds is roughly as fast as you can get with vsqrt and div. > > My best guess now is that gcc does a lot more to improve the accuracy of > vsqrt. > If I understand correctly, these are all the involved instructions: > > vmovaps .LC2(%rip), %zmm7 > vmovaps .LC3(%rip), %zmm6 > # for loop begins > vrsqrt14ps %zmm1, %zmm2 # comparison and mask removed > vmulps %zmm1, %zmm2, %zmm0 > vmulps %zmm2, %zmm0, %zmm1 > vmulps %zmm6, %zmm0, %zmm0 > vaddps %zmm7, %zmm1, %zmm1 > vmulps %zmm0, %zmm1, %zmm1 > vrcp14ps%zmm1, %zmm0 > vmulps %zmm1, %zmm0, %zmm1 > vmulps %zmm1, %zmm0, %zmm1 > vaddps %zmm0, %zmm0, %zmm0 > vsubps %zmm1, %zmm0, %zmm0 > vfnmadd213ps(%r10,%rax), %zmm0, %zmm2 > > If I understand this correctly: > > zmm2 =(approx) 1 / sqrt(zmm1) > zmm0 = zmm1 * zmm2 = (approx) sqrt(zmm1) > zmm1 = zmm0 * zmm2 = (approx) 1 > zmm0 = zmm6 * zmm0 = (approx) constant6 * sqrt(zmm1) > zmm1 = zmm7 * zmm1 = (approx) constant7 > zmm1 = zmm0 * zmm1 = (approx) constant6 * constant6 * sqrt(zmm1) > zmm0 = (approx) 1 / zmm1 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 * > constant7) > zmm1 = zmm1 * zmm0 = (approx) 1 > zmm1 = zmm1 * zmm0 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 * constant7) > zmm0 = 2 * zmm0 = (approx) 2 / sqrt(zmm1) * 2 / (constant6 * constant7) > zmm0 = zmm1 - zmm0 = (approx) -1 / sqrt(zmm1) * 1 / (constant6 * constant7) > > which implies that constant6 * constant6 = approximately -1? GCC implements /* sqrt(a) = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ which looks similar to what LLVM does. You can look at the -fdump-tree-optimized dump to see if there's anything fishy. > > LLVM seems to do a much simpler / briefer update of the output of vrsqrt. > > When I implemented a vrsqrt intrinsic in a Julia library, I just looked at > Wikipedia and did (roughly): > > constant1 = -0.5 > constant2 = 1.5 > > zmm2 = (approx) 1 / sqrt(zmm1) > zmm3 = constant * zmm1 > zmm1 = zmm2 * zmm2 > zmm3 = zmm3 * zmm1 + constant2 > zmm2 = zmm2 * zmm3 > > > I am not a numerical analyst, so I can't comment on relative validities or > accuracies of these approaches. > I also don't know what LLVM 7+ does. LLVM 6 doesn't use vrsqrt. > > I would be interesting in reading explanations or discussions, if any are > available. > >
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 --- Comment #6 from Nidal Faour --- Andrew Pinski is right, after chasing this bug with the help of Andrew Burgess in the file simple-object.c, calling the creat outfd = creat (dest, 00777); the creat function wraps the open function but do not pass open mode and the fix mentioned by Adrew was as follow: When opening output files for simple-object creation, we must ensure that the file is opened in binary mode. Failure to do so causes file corruption, and LTO failure on Windows targets. libiberty/ChangeLog: PR lto/88422 * simple-object.c (O_BINARY): Define if not already defined. (simple_object_copy_lto_debug_sections): Create file in binary mode. --- libiberty/ChangeLog | 7 +++ libiberty/simple-object.c | 6 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c index c1f38cee8ee..e061073abd1 100644 --- a/libiberty/simple-object.c +++ b/libiberty/simple-object.c @@ -44,6 +44,10 @@ Boston, MA 02110-1301, USA. */ #define SEEK_SET 0 #endif +#ifndef O_BINARY +# define O_BINARY 0 +#endif + #include "simple-object-common.h" /* The known object file formats. */ @@ -349,7 +353,7 @@ simple_object_copy_lto_debug_sections (simple_object_read *sobj, return errmsg; } - outfd = creat (dest, 00777); + outfd = open (dest, O_CREAT|O_WRONLY|O_TRUNC|O_BINARY, 00777); if (outfd == -1) { *err = errno; --
[Bug tree-optimization/88919] New test case gcc.dg/vect/pr88903-1.c in r268076 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88919 --- Comment #2 from Richard Biener --- Sandra posted a patch that will probably fix this (out-of-bound shift values).
[Bug c++/88951] [9 Regression] No fpermissive offerred on 'error: jump to case label'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88951 --- Comment #2 from Paolo Carlini --- Also note, further clarifying what I said in the linked messages, that we only temporarily, for few releases, accepted with -fpermissive such kind of broken code: before gcc5, -fpermissive suppressed the first error, but then an additional hard error was emitted anyway.
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 Devin Hussey changed: What|Removed |Added CC||husseydevin at gmail dot com --- Comment #4 from Devin Hussey --- Strangely, this doesn't seem to affect the ARM or aarch64 backends, although I am on a December build (specifically Dec 29). 8.2 is also unaffected. arm-none-eabi-gcc -mfloat-abi=hard -mfpu=neon -march=armv7-a -O3 -S test.c test: vldmia r1, {d0-d7} vldmia r2, {d24-d31} vadd.i32q8, q0, q12 vadd.i32q9, q1, q13 vadd.i32q10, q2, q14 vadd.i32q11, q3, q15 vstmia r0, {d16-d23} bx lr aarch64-none-eabi-gcc -O3 -S test.c test: ld1 {v16.16b - v19.16b}, [x1] ld1 {v4.16b - v7.16b}, [x2] add v0.4s, v16.4s, v4.4s add v1.4s, v17.4s, v5.4s add v2.4s, v18.4s, v6.4s add v3.4s, v19.4s, v7.4s st1 {v0.16b - v3.16b}, [x0] ret Amusingly, Clang trunk for ARMv7-a has a similar issue (aarch64 is fine). test: .fnstart .save {r11, lr} push{r11, lr} add r3, r1, #48 mov lr, r1 mov r12, r2 vld1.64 {d20, d21}, [r3] add r3, r2, #48 add r1, r1, #32 vld1.32 {d16, d17}, [lr]! vld1.32 {d18, d19}, [r12]! vadd.i32q8, q9, q8 vld1.64 {d22, d23}, [r3] vadd.i32q10, q11, q10 vld1.64 {d26, d27}, [r1] add r1, r2, #32 vld1.64 {d28, d29}, [r1] add r1, r0, #48 vadd.i32q11, q14, q13 vld1.64 {d24, d25}, [lr] vld1.64 {d18, d19}, [r12] vadd.i32q9, q9, q12 vst1.64 {d20, d21}, [r1] add r1, r0, #32 vst1.32 {d16, d17}, [r0]! vst1.64 {d22, d23}, [r1] vst1.64 {d18, d19}, [r0] pop {r11, pc}
[Bug fortran/37222] [OOP] Checks when overriding type-bound procedures are incomplete
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37222 janus at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from janus at gcc dot gnu.org --- (In reply to Jürgen Reuter from comment #4) > As Janus commented there is just one left-over (already fixed in the past > six years?). So what is really left to do here? I don't think the left-over is actually fixed (at least the FIXME notes are still present in interface.c). In any case, further improvement in this area is rather hard and yields only little gain, so I think it's reasonable to close this ten-year-old PR that presents no concrete test case (after all, the FIXMEs are still there for future reference).
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #7 from Uroš Bizjak --- (In reply to Christopher Leonard from comment #5) > Is the order at least consistant with x86-32? i.e. if you give a 64-bit > input operand to inline assembly the order is hi:lo? I'm worried this is a > bizarre convention imposed on high endian architectures. On x86, we don't allow register pairs in asm at all. Please see print_reg, where: switch (msize) { case 16: case 12: case 8: if (GENERAL_REGNO_P (regno) && msize > GET_MODE_SIZE (word_mode)) warning (0, "unsupported size for integer register"); /* FALLTHRU */ So, if someone wants to handle DImode on 32bit targets, both registers have to be passed to assembly explicitly, using "(int) lval" and "(int) (lval >> 32)".
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #5 from Andrew Pinski --- (In reply to Devin Hussey from comment #4) > Strangely, this doesn't seem to affect the ARM or aarch64 backends, although > I am on a December build (specifically Dec 29). 8.2 is also unaffected. This is due to those backends support very wide integer modes (OI, etc.). > aarch64-none-eabi-gcc -O3 -S test.c > > test: > ld1 {v16.16b - v19.16b}, [x1] > ld1 {v4.16b - v7.16b}, [x2] > add v0.4s, v16.4s, v4.4s > add v1.4s, v17.4s, v5.4s > add v2.4s, v18.4s, v6.4s > add v3.4s, v19.4s, v7.4s > st1 {v0.16b - v3.16b}, [x0] > ret This is not really that good code either on most if not all micro-arch of ARMv8. Doing, 8 ldr/ld1 and 4 st1 is almost always better.
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2019-01-22 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #7 from Richard Biener --- Thanks for catching this! I'll apply the patch.
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work||8.2.1, 9.0 Resolution|--- |FIXED Known to fail||8.2.0 --- Comment #9 from Richard Biener --- Fixed.
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 --- Comment #8 from Richard Biener --- Author: rguenth Date: Tue Jan 22 09:47:52 2019 New Revision: 268141 URL: https://gcc.gnu.org/viewcvs?rev=268141&root=gcc&view=rev Log: 2019-01-22 Nidal Faour PR lto/88422 * simple-object.c (O_BINARY): Define if not already defined. (simple_object_copy_lto_debug_sections): Create file in binary mode. Modified: trunk/libiberty/ChangeLog trunk/libiberty/simple-object.c
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 --- Comment #10 from Richard Biener --- Author: rguenth Date: Tue Jan 22 09:49:27 2019 New Revision: 268142 URL: https://gcc.gnu.org/viewcvs?rev=268142&root=gcc&view=rev Log: 2019-01-22 Nidal Faour PR lto/88422 * simple-object.c (O_BINARY): Define if not already defined. (simple_object_copy_lto_debug_sections): Create file in binary mode. Modified: branches/gcc-8-branch/libiberty/ChangeLog branches/gcc-8-branch/libiberty/simple-object.c
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #6 from Andrew Pinski --- Try using 128 (or 256) and you might see that aarch64 falls down similarly.
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #8 from Andreas Schwab --- reg:reg+1 maps to lo:hi on x86.
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #9 from Christopher Leonard --- (In reply to Andrew Pinski from comment #6) > Yes the order is always hi:lo (reg:reg+1) on all targets I know of This is definitely not the natural choice (on any platform: I agree, endianness is irrelevant here) so I would recommend documenting this as well, and potentially recommending in the docs to explicitly cast e.g. a parameter for a function-style macro used as an input operand expression for inline asm, (%L0 is no help when the size is unknown, it seems to select the "next" register when you give a 32-bit type, which isn't even loaded with a value in the generated PPC assembly). This is how the code messed up for me, I wrote a macro function to generate MTSPR instructions for a given SPR and load value (this is needed since the SPR number used in MTSPR is immediate, there is not alternative where you can take the SPR from a register). One of the constants I used in the calculation of an SPR's load value became a 64-bit type in a later code change, making the input operand 64-bit instead of 32-bit, breaking my code.
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #10 from Christopher Leonard --- Getting contradictory statements now: >reg:reg+1 maps to lo:hi on x86. >On x86, we don't allow register pairs in asm at all. Not allowing, or printing a warning, is much better behavior than what I have been getting on PPC.
[Bug tree-optimization/88044] [9 regression] gfortran.dg/transfer_intrinsic_3.f90 hangs after r266171
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88044 --- Comment #15 from Jakub Jelinek --- Author: jakub Date: Tue Jan 22 09:58:23 2019 New Revision: 268143 URL: https://gcc.gnu.org/viewcvs?rev=268143&root=gcc&view=rev Log: PR tree-optimization/88044 * tree-ssa-loop-niter.c (number_of_iterations_cond): If condition is false in the first iteration, but !every_iteration, return false instead of true with niter->niter zero. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-loop-niter.c
[Bug tree-optimization/88044] [9 regression] gfortran.dg/transfer_intrinsic_3.f90 hangs after r266171
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88044 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #16 from Jakub Jelinek --- Fixed.
[Bug tree-optimization/88862] [9 Regression] ICE in extract_affine, at graphite-sese-to-poly.c:313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88862 --- Comment #2 from Richard Biener --- Huh. We get &itarg1 here from originally (integer(kind=4)) &itarg1. The stmt we analyze is if (_4 != _316) I have a simple patch.
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #7 from Marc Glisse --- See PR 55266 (and several others).
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #8 from Richard Biener --- You can try the attached patch, it "fixes" the issue on the GIMPLE side but appearantly the BIT_FIELD_REF stores go a weird path during RTL expansion and so we end up spilling again.
[Bug preprocessor/88966] Indirect stringification of "linux" produces "1"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88966 Jonathan Wakely changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #5 from Jonathan Wakely --- This is not a bug, "linux" is a predefined macro and the preprocessor is doing exactly what it's supposed to. See https://gcc.gnu.org/onlinedocs/cpp/System-specific-Predefined-Macros.html
[Bug middle-end/88897] Bogus maybe-uninitialized warning on class field
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88897 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 Ever confirmed|0 |1 --- Comment #6 from Richard Biener --- So this boils down to a missed optimization (as many cases do...). The uninit warning sees [local count: 1073741825]: _3 = bar (); future_state::future_state (&_local_state); MEM[(struct &)&_local_state] ={v} {CLOBBER}; MEM[(struct optional *)&_local_state]._M_engaged = 0; MEM[(struct optional *)_3]._M_engaged = 0; _7 = MEM[(struct optional &)&_local_state]._M_engaged; if (_7 != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 536870912]: _6 = MEM[(struct temporary_buffer &)&_local_state]._buffer; ... and warns about the load _6 = ... As you can see the condition isn't elided and somehow we didn't manage to CSE the load of _M_engaged here, possibly due to the appearant aliasing of the store via _3. points-to analysis explicitely says it might alias _local_state because _local_state escapes to future_state::future_state and PTA is not flow-sensitive: [local count: 1073741825]: # PT = nonlocal escaped null # USE = nonlocal null { D.2493 } (escaped) # CLB = nonlocal null { D.2493 } (escaped) _3 = bar (); # USE = nonlocal null { D.2493 } (escaped) # CLB = nonlocal null { D.2493 } (escaped) future_state::future_state (&_local_stateD.2493); MEM[(struct &)&_local_stateD.2493] ={v} {CLOBBER}; MEM[(struct optionalD.2409 *)&_local_stateD.2493]._M_engagedD.2426 = 0; MEM[(struct optionalD.2409 *)_3]._M_engagedD.2426 = 0;
[Bug rtl-optimization/88904] [9 Regression] Basic block incorrectly skipped in jump threading.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Jakub Jelinek --- Fixed.
[Bug target/88906] wrong code with -march=k6 -minline-all-stringops -minline-stringops-dynamically -mmemcpy-strategy=libcall:-1:align and vector argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88906 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek --- Fixed on the trunk so far.
[Bug fortran/37398] Statement functions mask missing PURE procedures.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37398 --- Comment #4 from Dominique d'Humieres --- > This correctly gives the expected error messages since at least gfortran 5.4. > Closing as FIXED? FORALL(i=1:4) a(i) = st3 (i) is still not caught.
[Bug rtl-optimization/88953] Unrecognizable insn on architecture zEC12 with boost::bimap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88953 --- Comment #4 from Andreas Krebbel --- Looks like a problem which was fixed with r265158: S/390: Fix problem with vec_init expander gcc/ChangeLog: 2018-10-15 Andreas Krebbel * config/s390/s390.c (s390_expand_vec_init): Force vector element into reg if it isn't a general operand. gcc/testsuite/ChangeLog: 2018-10-15 Andreas Krebbel * g++.dg/vec-init-1.C: New test. I've backported the patch to GCC 7 and 8 branch on 2018-10-19. Canonical is aware of the problem and will pick the patch up for their next GCC updates. Could you please check whether this fixes your problem?
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED CC||jakub at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- --- gcc/gimple-loop-interchange.cc.jj 2019-01-01 12:37:17.416970701 +0100 +++ gcc/gimple-loop-interchange.cc 2019-01-22 11:34:42.303796570 +0100 @@ -692,7 +692,7 @@ loop_cand::analyze_induction_var (tree v iv->var = var; iv->init_val = init; iv->init_expr = chrec; - iv->step = build_int_cst (TREE_TYPE (chrec), 0); + iv->step = build_zero_cst (TREE_TYPE (chrec)); m_inductions.safe_push (iv); return true; } fixes this. SCEV is able to deal with non-integral/pointer IVs like SCALAR_FLOAT_TYPE_P in this case and create_iv as well, just build_int_cst must not be used in that case.
[Bug c++/88971] New: Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 Bug ID: 88971 Summary: Branch optimization inconsistency (missed optimization) Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: maratrus at mail dot ru Target Milestone: --- Created attachment 45490 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45490&action=edit A code that demonstrates different patterns in optimization technique In the code attached I expect the compiler not to generate any code between two `mfence` instructions in the method `CheckAndPrint()`. Indeed, it does the good job if I call `PrintGood()` method and no code is generated. But if I out-comment `PrintBad()` or even simple return the compiler generates a code for the if-expression `if (t.j > 0)`. In all three cases there seems to be no reason to generate any code. The code attached is compiled as: `g++ -std=c++11 -Ofast opt_template.cc -o opt_template` I must be missing something but is there a good reason why the compiler managed to optimize the code in one case but non in the other two?
[Bug rtl-optimization/88953] Unrecognizable insn on architecture zEC12 with boost::bimap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88953 --- Comment #5 from Jan Kossmann --- You are right, I verified with: gcc version 9.0.0 20190122 (experimental) (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-o' 'test.cpp.o' '-shared-libgcc' '-march=z13' '-mno-htm' '-mzarch' '-m64' gcc/bin/../libexec/gcc/s390x-ibm-linux-gnu/9.0.0/cc1plus -E -quiet -v -imultiarch s390x-linux-gnu -iprefix gcc/bin/../lib/gcc/s390x-ibm-linux-gnu/9.0.0/ -D_GNU_SOURCE test.cpp -march=z13 -mno-htm -mzarch -m64 -O3 -fpch-preprocess -o test.ii and it worked out fine. Sorry for the trouble, thanks for your help!
[Bug rtl-optimization/88948] [9 Regression] ICE in elimination_costs_in_insn, at reload1.c:3640 since r264148
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88948 --- Comment #2 from Uroš Bizjak --- The problem is with can_assign_to_reg_without_clobbers_p in gcse.c, where we have: /* If the test insn is valid and doesn't need clobbers, and the target also has no objections, we're good. */ if (icode >= 0 && (num_clobbers == 0 || !added_clobbers_hard_reg_p (icode)) && ! (targetm.cannot_copy_insn_p && targetm.cannot_copy_insn_p (test_insn))) can_assign = true; The test instruction is created as: (insn 26 0 0 (set (reg:SI 152) (fix:SI (reg:DF 89))) -1 (nil)) which is (correctly) recognized as (define_insn "fix_trunc_i387_fisttp" [(set (match_operand:SWI248x 0 "nonimmediate_operand" "=m") (fix:SWI248x (match_operand 1 "register_operand" "f"))) (clobber (match_scratch:XF 2 "=&f"))] However, recog also reports that 1 clobber needs to be added. The instruction is recognized nevertheless due to "|| !added_clobbers_hard_reg_p (icode)" bypass. The recognized insn doesn't clobber hard reg, but it also needs a clobber of a scratch reg to be recognized.
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #24 from Chris Elrod --- The dump looks like this: vect__67.78_217 = SQRT (vect__213.77_225); vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 } / vect__67.78_217; vect__71.80_249 = vect__246.59_65 * vect_ui33_68.79_248; vect_u13_73.81_250 = vect__187.71_14 * vect_ui33_68.79_248; vect_u23_75.82_251 = vect__200.74_5 * vect_ui33_68.79_248; so the vrsqrt optimization happens later. g++ shows the same problems with weird code generation. However this: /* sqrt(a) = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ does not match this: vrsqrt14ps %zmm1, %zmm2 # comparison and mask removed vmulps %zmm1, %zmm2, %zmm0 vmulps %zmm2, %zmm0, %zmm1 vmulps %zmm6, %zmm0, %zmm0 vaddps %zmm7, %zmm1, %zmm1 vmulps %zmm0, %zmm1, %zmm1 vrcp14ps%zmm1, %zmm0 vmulps %zmm1, %zmm0, %zmm1 vmulps %zmm1, %zmm0, %zmm1 vaddps %zmm0, %zmm0, %zmm0 vsubps %zmm1, %zmm0, %zmm0 Recommendations on the next place to look for what's going on?
[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|2019-01-21 00:00:00 |2019-01-22 CC||ktkachov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from ktkachov at gcc dot gnu.org --- Confirmed on aarch64 then.
[Bug tree-optimization/88972] New: popcnt of limited 128-bit number with unnecessary zeroing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88972 Bug ID: 88972 Summary: popcnt of limited 128-bit number with unnecessary zeroing Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: drepper.fsp+rhbz at gmail dot com Target Milestone: --- Compile the following code on x86-64 with -Ofast -march=haswell: int f(__uint128_t m) { if (m < 64000) return __builtin_popcount(m); return -1; } The generated code with the trunk gcc looks like this: 0: b8 ff f9 00 00 mov$0xf9ff,%eax 5: 48 39 f8cmp%rdi,%rax 8: b8 00 00 00 00 mov$0x0,%eax d: 48 19 f0sbb%rsi,%rax 10: 72 0e jb 20 12: 31 c0 xor%eax,%eax 14: f3 0f b8 c7 popcnt %edi,%eax 18: c3 retq 19: 0f 1f 80 00 00 00 00nopl 0x0(%rax) 20: b8 ff ff ff ff mov$0x,%eax 25: c3 retq The instruction at offset 12 is unnecessary. I guess this is a left-over from the popcnt of the upper half which is recognized to be unnecessary and left out. There is no addition anymore but somehow the register clearing survived.
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #25 from rguenther at suse dot de --- On Tue, 22 Jan 2019, elrodc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 > > --- Comment #24 from Chris Elrod --- > The dump looks like this: > > vect__67.78_217 = SQRT (vect__213.77_225); > vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, > 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 > } / vect__67.78_217; > vect__71.80_249 = vect__246.59_65 * vect_ui33_68.79_248; > vect_u13_73.81_250 = vect__187.71_14 * vect_ui33_68.79_248; > vect_u23_75.82_251 = vect__200.74_5 * vect_ui33_68.79_248; > > so the vrsqrt optimization happens later. g++ shows the same problems with > weird code generation. However this: > > /* sqrt(a) = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) > rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ > > does not match this: > > vrsqrt14ps %zmm1, %zmm2 # comparison and mask removed > vmulps %zmm1, %zmm2, %zmm0 > vmulps %zmm2, %zmm0, %zmm1 > vmulps %zmm6, %zmm0, %zmm0 > vaddps %zmm7, %zmm1, %zmm1 > vmulps %zmm0, %zmm1, %zmm1 > vrcp14ps%zmm1, %zmm0 > vmulps %zmm1, %zmm0, %zmm1 > vmulps %zmm1, %zmm0, %zmm1 > vaddps %zmm0, %zmm0, %zmm0 > vsubps %zmm1, %zmm0, %zmm0 > > Recommendations on the next place to look for what's going on? You can try enabling -mrecip to see RSQRT in .optimized - there's probably late 1/sqrt optimization on RTL.
[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963 --- Comment #9 from Devin Hussey --- (In reply to Andrew Pinski from comment #6) > Try using 128 (or 256) and you might see that aarch64 falls down similarly. yup. Oof. test: sub sp, sp, #560 stp x29, x30, [sp] mov x29, sp stp x19, x20, [sp, 16] mov x19, 128 mov x20, x0 add x0, sp, 176 str x21, [sp, 32] mov x21, x2 mov x2, x19 bl memcpy mov x2, x19 mov x1, x21 add x0, sp, 304 bl memcpy ldr q7, [sp, 176] mov x2, x19 ldr q6, [sp, 192] add x1, sp, 48 ldr q5, [sp, 208] mov x0, x20 ldr q4, [sp, 224] ldr q3, [sp, 240] ldr q2, [sp, 256] ldr q1, [sp, 272] ldr q0, [sp, 288] ldr q23, [sp, 304] ldr q22, [sp, 320] ldr q21, [sp, 336] ldr q20, [sp, 352] ldr q19, [sp, 368] ldr q18, [sp, 384] ldr q17, [sp, 400] ldr q16, [sp, 416] add v7.4s, v7.4s, v23.4s add v6.4s, v6.4s, v22.4s add v5.4s, v5.4s, v21.4s add v4.4s, v4.4s, v20.4s add v3.4s, v3.4s, v19.4s str q7, [sp, 48] add v2.4s, v2.4s, v18.4s str q6, [sp, 64] add v1.4s, v1.4s, v17.4s str q5, [sp, 80] add v0.4s, v0.4s, v16.4s str q4, [sp, 96] str q3, [sp, 112] str q2, [sp, 128] str q1, [sp, 144] str q0, [sp, 160] bl memcpy ldp x29, x30, [sp] ldp x19, x20, [sp, 16] ldr x21, [sp, 32] add sp, sp, 560 ret
[Bug rtl-optimization/88953] Unrecognizable insn on architecture zEC12 with boost::bimap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88953 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Jakub Jelinek --- Fixed then on all active branches.
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 --- Comment #4 from Jakub Jelinek --- Created attachment 45491 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45491&action=edit gcc9-pr88964.patch Untested fix.
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #26 from Chris Elrod --- > You can try enabling -mrecip to see RSQRT in .optimized - there's > probably late 1/sqrt optimization on RTL. No luck. The full commands I used: gfortran -Ofast -mrecip -S -fdump-tree-optimized -march=native -shared -fPIC -mprefer-vector-width=512 -fno-semantic-interposition -o gfortvectorizationdump.s vectorization_test.f90 g++ -mrecip -Ofast -fdump-tree-optimized -S -march=native -shared -fPIC -mprefer-vector-width=512 -fno-semantic-interposition -o gppvectorization_test.s vectorization_test.cpp g++'s output was similar: vect_U33_60.31_372 = SQRT (vect_S33_59.30_371); vect_Ui33_61.32_374 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 } / vect_U33_60.31_372; vect_U13_62.33_375 = vect_S13_47.24_359 * vect_Ui33_61.32_374; vect_U23_63.34_376 = vect_S23_53.27_365 * vect_Ui33_61.32_374; and it has the same assembly as gfortran for the rsqrt: vcmpps $4, %zmm0, %zmm5, %k1 vrsqrt14ps %zmm0, %zmm1{%k1}{z} vmulps %zmm0, %zmm1, %zmm2 vmulps %zmm1, %zmm2, %zmm0 vmulps %zmm6, %zmm2, %zmm2 vaddps %zmm7, %zmm0, %zmm0 vmulps %zmm2, %zmm0, %zmm0 vrcp14ps%zmm0, %zmm10 vmulps %zmm0, %zmm10, %zmm0 vmulps %zmm0, %zmm10, %zmm0 vaddps %zmm10, %zmm10, %zmm10 vsubps %zmm0, %zmm10, %zmm10
[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950 Matthew Malcomson changed: What|Removed |Added Known to fail||5.4.0 --- Comment #5 from Matthew Malcomson --- This problem has been around for a long time -- I have seen the same fundamental problem on gcc 5.4 (when looking for a version to put in the "known to work" field). With "gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" on the same testcase, the stack_protect_test pattern gets reordered to before the second memory access (the "buf[b] = c" line), and again the stack protection does not guard this memory access. (insn:TI 8 126 16 (parallel [ (set (mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29) (const_int 88 [0x58])) [1 D.2834+0 S8 A64]) (unspec:DI [ (mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1 __stack_chk_guard+0 S8 A64]) ] UNSPEC_SP_SET)) (set (reg:DI 5 x5 [126]) (const_int 0 [0])) ]) stack-reorder.c:1 864 {stack_protect_set_di} (expr_list:REG_UNUSED (reg:DI 5 x5 [126]) (nil))) (insn:TI 16 8 71 (set (mem/j:QI (plus:DI (reg:DI 0 x0 [105]) (const_int 4016 [0xfb0])) [0 buf S1 A8]) (reg:QI 4 x4 [106])) stack-reorder.c:3 45 {*movqi_aarch64} (expr_list:REG_DEAD (reg:QI 4 x4 [106]) (expr_list:REG_DEAD (reg:DI 0 x0 [105]) (nil (insn 71 16 22 (parallel [ (set (reg:DI 3 x3 [125]) (unspec:DI [ (mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29) (const_int 88 [0x58])) [1 D.2834+0 S8 A64]) (mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1 __stack_chk_guard+0 S8 A64]) ] UNSPEC_SP_TEST)) (clobber (reg:DI 0 x0 [127])) ]) stack-reorder.c:14 866 {stack_protect_test_di} (expr_list:REG_UNUSED (reg:DI 0 x0 [127]) (nil))) (insn:TI 22 71 140 (set (mem/j:QI (plus:DI (reg:DI 1 x1 [110]) (const_int 4016 [0xfb0])) [0 buf S1 A8]) (reg:QI 2 x2 [ c ])) stack-reorder.c:4 45 {*movqi_aarch64} (expr_list:REG_DEAD (reg:QI 2 x2 [ c ]) (expr_list:REG_DEAD (reg:DI 1 x1 [110]) (nil
[Bug target/88954] __attribute__((noplt)) doesn't work with function pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954 --- Comment #5 from Richard Biener --- For indirect calls the attributes on the function type pointed to a relevant. Unioning attributes from the actually called function (if the compiler can figure that out) can be appropriate depending on the actual attribute.
[Bug tree-optimization/88713] Vectorized code slow vs. flang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 --- Comment #27 from Chris Elrod --- g++ -mrecip=all -O3 -fno-signed-zeros -fassociative-math -freciprocal-math -fno-math-errno -ffinite-math-only -fno-trapping-math -fdump-tree-optimized -S -march=native -shared -fPIC -mprefer-vector-width=512 -fno-semantic-interposition -o gppvectorization_test.s vectorization_test.cpp is not enough to get vrsqrt. I need -funsafe-math-optimizations for the instruction to appear in the asm.
[Bug tree-optimization/88973] New: New -Wrestrict warning since r268048
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88973 Bug ID: 88973 Summary: New -Wrestrict warning since r268048 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org CC: msebor at gcc dot gnu.org Target Milestone: --- Created attachment 45492 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45492&action=edit test-case The test-case comes from autogen package: $ gcc autogen.i -c -O2 -Werror=restrict In function ‘strcpy’, inlined from ‘canonicalize_pathname’ at autogen.i:10536:17, inlined from ‘option_pathfind.constprop’ at autogen.i:10420:32: autogen.i:4050:10: error: ‘__builtin_strcpy’ accessing 1 byte at offsets [0, 9223372036854775807] and [0, 9223372036854775807] may overlap 1 byte at offset 0 [-Werror=restrict] 4050 | return __builtin___strcpy_chk (__dest, __src, __builtin_object_size (__dest, 2 > 1)); | ^ cc1: some warnings being treated as errors Martin can you please verify that the warning is correct?
[Bug c/88955] transparent_union for vector types not accepted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88955 Richard Biener changed: What|Removed |Added Keywords||rejects-valid Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 CC||hjl.tools at gmail dot com, ||jsm28 at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- Hmm. I guess the "issue" is that the union has TImode rather than V2DImode. stor-layout doesn't look at TYPE_TRANSPARENT_AGGR at all though. Relevant is /* If we only have one real field; use its mode if that mode's size matches the type's size. This generally only applies to RECORD_TYPE. For UNION_TYPE, if the widest field is MODE_INT then use that mode. If the widest field is MODE_PARTIAL_INT, and the union will be passed by reference, then use that mode. */ poly_uint64 type_size; if ((TREE_CODE (type) == RECORD_TYPE || (TREE_CODE (type) == UNION_TYPE && (GET_MODE_CLASS (mode) == MODE_INT || (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT && targetm.calls.pass_by_reference (pack_cumulative_args (0), mode, type, 0) && mode != VOIDmode && poly_int_tree_p (TYPE_SIZE (type), &type_size) && known_eq (GET_MODE_BITSIZE (mode), type_size)) ; else mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk (); where we reject vector modes. The C++ diagnostic is a bit more clear: > g++ t.c -S t.c:5:1: error: type transparent ‘union’ cannot be made transparent because the type of the first field has a different ABI from the class overall { ^ which hints at the implementation of the argument passing being the culprit for the restriction (not sure why the ABI of the class overall should matter given the docs of transparent_union say the ABI is specified by the first field...)
[Bug tree-optimization/88862] [9 Regression] ICE in extract_affine, at graphite-sese-to-poly.c:313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88862 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Richard Biener --- Fixed.
[Bug tree-optimization/88862] [9 Regression] ICE in extract_affine, at graphite-sese-to-poly.c:313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88862 --- Comment #4 from Richard Biener --- Author: rguenth Date: Tue Jan 22 11:28:56 2019 New Revision: 268147 URL: https://gcc.gnu.org/viewcvs?rev=268147&root=gcc&view=rev Log: 2019-01-22 Richard Biener PR tree-optimization/88862 * graphite-scop-detection.c (scop_detection::graphite_can_represent_scev): Reject ADDR_EXPR. Modified: trunk/gcc/ChangeLog trunk/gcc/graphite-scop-detection.c
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 --- Comment #5 from Richard Biener --- Hmm, I wonder if handling FP inductions during interchange causes correctness issues as well (FP rounding, etc.). Otherwise the patch looks obvious.
[Bug target/88965] powerpc64le vector builtin hits ICE in verify_gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88965 --- Comment #4 from Richard Biener --- LGTM
[Bug c/88968] [8/9 Regression] Stack overflow in gimplify_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88968 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug c++/88969] [9 Regression] ICE in build_op_delete_call, at cp/call.c:6509
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969 Richard Biener changed: What|Removed |Added Priority|P3 |P1 Target Milestone|--- |9.0 Summary|ICE in |[9 Regression] ICE in |build_op_delete_call, at|build_op_delete_call, at |cp/call.c:6509 |cp/call.c:6509
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 --- Comment #6 from Jakub Jelinek --- In the spot which I'm changing IMHO shouldn't, that + 0.0 really should be folded (and if not, we should tweak create_iv not to do any addition if real_zerop). Though of course for other floating point IVs where the step is non-zero it could make a difference.
[Bug tree-optimization/88970] ICE: verify_ssa failed (error: definition in block 2 follows the use)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88970 Richard Biener changed: What|Removed |Added CC||jason at gcc dot gnu.org Version|unknown |9.0 --- Comment #2 from Richard Biener --- Looks like a missing/incomplete DECL_EXPR. ;; Function void d() (null) ;; enabled by -tree-original { typedef int e[0:(sizetype) SAVE_EXPR ]; ^^^ shouldn't this have (ssizetype) b (1) + -1)? int f[0:(sizetype) SAVE_EXPR ]; int c; typedef struct __lambda0 __lambda0; ssizetype D.2306; < (1) + -1) >; <];>>; int c; <::operator() (&TARGET_EXPR ) >; }
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 Richard Biener changed: What|Removed |Added Keywords||missed-optimization CC||rguenth at gcc dot gnu.org Component|c++ |libstdc++ --- Comment #1 from Richard Biener --- This is because it still needs to generate the std::string objects at the caller site (outside of the if (print)). This involves quite some code to get rid of, and even at -O3 we do not inline basic_string::basic_string it seems (ISTR that is out-of-line in the library): __asm__ __volatile__("mfence" : : : "memory"); _6 = MEM[(const int *)&data + 4B]; if (_6 > 0) goto ; [41.48%] else goto ; [58.52%] [local count: 445388109]: std::basic_string::basic_string (&D.39204, "<", &D.39205); _7 = MEM[(char * *)&D.39204]; _8 = _7 + 18446744073709551592; if (_8 != &_S_empty_rep_storage) goto ; [10.00%] else goto ; [90.00%] [local count: 434030711]: goto ; [100.00%] [local count: 44538811]: if (__gthrw___pthread_key_create != 0B) goto ; [53.47%] else goto ; [46.53%] [local count: 23814902]: _9 = &MEM[(struct _Rep *)_7 + -24B].D.23940._M_refcount; _10 = __atomic_fetch_add_4 (_9, 4294967295, 4); _11 = (int) _10; goto ; [100.00%] [local count: 20723909]: __result_12 = MEM[(_Atomic_word *)_7 + -8B]; _13 = __result_12 + -1; MEM[(_Atomic_word *)_7 + -8B] = _13; [local count: 44538811]: # _14 = PHI <_11(6), __result_12(7)> if (_14 <= 0) goto ; [25.50%] else goto ; [74.50%] [local count: 11357397]: std::basic_string::_Rep::_M_destroy (_8, &D.39206); [local count: 445388108]: D.39206 ={v} {CLOBBER}; D.39204 ={v} {CLOBBER}; D.39205 ={v} {CLOBBER}; [local count: 1073741825]: __asm__ __volatile__("mfence" : : : "memory"); data ={v} {CLOBBER};
[Bug target/88972] popcnt of limited 128-bit number with unnecessary zeroing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88972 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-22 Component|tree-optimization |target Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Err, __builtin_popcount has an integer argument so you call popcount on (int)m. The reason must be different. (insn 17 16 26 4 (parallel [ (set (reg:SI 88 [ ]) (popcount:SI (subreg:SI (reg/v:TI 89 [ m ]) 0))) (clobber (reg:CC 17 flags)) ]) "t.c":4 -1 (nil))
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 --- Comment #7 from Jakub Jelinek --- Actually no, with HONOR_SIGNED_ZEROS it shouldn't be optimized out. So, if we don't have other way how to make distinction between a normal chrec with step +0.0 and loop invariant var, we should punt at least for HONOR_SIGNED_ZEROS.
[Bug tree-optimization/88973] [8/9 Regression] New -Wrestrict warning since r268048
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88973 Richard Biener changed: What|Removed |Added Keywords||diagnostic Priority|P3 |P2 Known to work||8.2.0 Target Milestone|--- |8.3 Summary|New -Wrestrict warning |[8/9 Regression] New |since r268048 |-Wrestrict warning since ||r268048 Known to fail||8.2.1 --- Comment #1 from Richard Biener --- I believe the change was backported as well.
[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964 Jakub Jelinek changed: What|Removed |Added Attachment #45491|0 |1 is obsolete|| --- Comment #8 from Jakub Jelinek --- Created attachment 45493 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45493&action=edit gcc9-pr88964.patch Updated patch.
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #2 from Jonathan Wakely --- (In reply to Richard Biener from comment #1) > rid of, and even at -O3 we do not inline basic_string::basic_string it seems > (ISTR that is out-of-line in the library): There's an explicit instantiation in the library, but the definition is inline in the headers. If the compiler wanted to inline it, all the code is visible and nothing forces it to use the explicit instantiation in the library.
[Bug target/88972] popcnt of limited 128-bit number with unnecessary zeroing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88972 Uroš Bizjak changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #2 from Uroš Bizjak --- This is by design. /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency for bit-manipulation instructions. */ DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi", m_SANDYBRIDGE | m_CORE_AVX2 | m_GENERIC)
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #3 from Jonathan Wakely --- (In reply to Richard Biener from comment #1) > This is because it still needs to generate the std::string objects at the > caller > site (outside of the if (print)). This involves quite some code to get > rid of, and even at -O3 we do not inline basic_string::basic_string it seems > (ISTR that is out-of-line in the library): > > __asm__ __volatile__("mfence" : : : "memory"); > _6 = MEM[(const int *)&data + 4B]; > if (_6 > 0) > goto ; [41.48%] > else > goto ; [58.52%] > >[local count: 445388109]: > std::basic_string::basic_string (&D.39204, "<", &D.39205); > _7 = MEM[(char * *)&D.39204]; > _8 = _7 + 18446744073709551592; > if (_8 != &_S_empty_rep_storage) > goto ; [10.00%] > else > goto ; [90.00%] Looks like you're using -D_GLIBCXX_USE_CXX11_ABI=0 but the OP is not.
[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952 --- Comment #11 from Uroš Bizjak --- (In reply to Christopher Leonard from comment #10) > Getting contradictory statements now: > >reg:reg+1 maps to lo:hi on x86. > >On x86, we don't allow register pairs in asm at all. > > Not allowing, or printing a warning, is much better behavior than what I > have been getting on PPC. Ah, sorry - x86 emits a warning.