[Bug target/56619] i386 hle atomic intrinsics flags are undocumented
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56619 --- Comment #2 from ak at gcc dot gnu.org 2013-03-15 04:31:53 UTC --- Author: ak Date: Fri Mar 15 04:31:43 2013 New Revision: 196671 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196671 Log: Document HLE / RTM intrinsics The TSX HLE/RTM intrinsics were missing documentation. Add this to the manual. gcc/: 2013-03-14 Andi Kleen PR target/56619 * doc/extend.texi: Document __ATOMIC_HLE_ACQUIRE, __ATOMIC_HLE_RELEASE. Document __builtin_ia32 TSX intrincs. Document _x* TSX intrinsics. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/extend.texi
[Bug target/55139] __atomic store does not support __ATOMIC_HLE_RELEASE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55139 --- Comment #5 from ak at gcc dot gnu.org 2012-11-09 15:24:32 UTC --- Author: ak Date: Fri Nov 9 15:24:25 2012 New Revision: 193363 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193363 Log: Handle target specific memory models in C frontend get_atomic_generic_size would error out for __atomic_store(...,__ATOMIC_HLE_RELEASE) Just mask it out. All the memory orders are checked completely in builtins.c anyways. I'm not sure what that check is for, it could be removed in theory. Passed bootstrap and test suite on x86-64 gcc/c-family/: 2012-11-09 Andi Kleen PR 55139 * c-common.c (get_atomic_generic_size): Mask with MEMMODEL_MASK Modified: trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.c
[Bug lto/46905] -flto -fno-lto does not disable lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46905 --- Comment #3 from ak at gcc dot gnu.org 2010-12-19 19:36:29 UTC --- Author: ak Date: Sun Dec 19 19:36:25 2010 New Revision: 168071 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168071 Log: Fix -fno-lto (PR lto/46905) gcc/ 2010-12-19 Andi Kleen PR lto/46905 * collect2.c (main): Handle -fno-lto. * opts.c (common_handle_option): Handle -fno-lto. Modified: trunk/gcc/ChangeLog trunk/gcc/collect2.c trunk/gcc/opts.c
[Bug lto/50679] Linux kernel LTO tracking bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50679 ak at gcc dot gnu.org changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #1 from ak at gcc dot gnu.org 2011-10-09 14:06:44 UTC --- urrently I have to revert 2 patches to get anywhere near a build and work around 50644. 32bit builds don't work at all because of the tree-nrv problem. 50620 causes incredly slow builds because partitioning has to be disabled
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #15 from ak at gcc dot gnu.org 2011-10-17 14:43:45 UTC --- Author: ak Date: Mon Oct 17 14:43:37 2011 New Revision: 180093 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180093 Log: Use MADV_DONTNEED for freeing in garbage collector Use the Linux MADV_DONTNEED call to unmap free pages in the garbage collector.Then keep the unmapped pages in the free list. This avoid excessive memory fragmentation on large LTO bulds, which can lead to gcc bumping into the Linux vm_max_map limit per process. gcc/: 2011-10-08 Andi Kleen PR other/50636 * config.in, configure: Regenerate. * configure.ac (madvise): Add to AC_CHECK_FUNCS. * ggc-page.c (USING_MADVISE): Add. (page_entry): Add discarded field. (alloc_page): Check for discarded pages. (release_pages): Add USING_MADVISE branch. Modified: trunk/gcc/ChangeLog trunk/gcc/config.in trunk/gcc/configure trunk/gcc/configure.ac trunk/gcc/ggc-page.c
[Bug middle-end/88573] 9 regression: error: type mismatch in component reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88573 ak at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||ak at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #2 from ak at gcc dot gnu.org --- Dup *** This bug has been marked as a duplicate of bug 88140 ***
[Bug lto/88140] [9 Regression] ICE: verify_gimple failed since r266325
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88140 ak at gcc dot gnu.org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #10 from ak at gcc dot gnu.org --- *** Bug 88573 has been marked as a duplicate of this bug. ***
[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355 --- Comment #2 from ak at gcc dot gnu.org --- Author: ak Date: Mon Dec 11 16:13:53 2017 New Revision: 255540 URL: https://gcc.gnu.org/viewcvs?rev=255540&root=gcc&view=rev Log: Fix stack overflow with autofdo (PR83355) g++.dg/bprob* is failing currently with autofdo. Running in gdb shows that there is a very deep recursion in get_index_by_decl until it overflows the stack. gcc/: 2017-12-11 Andi Kleen PR gcov-profile/83355 * auto-profile.c (string_table::get_index_by_decl): Don't recurse when abstract origin points to itself. Modified: trunk/gcc/ChangeLog trunk/gcc/auto-profile.c trunk/gcc/lto-streamer-in.c
[Bug c++/55223] [C++11] Default lambda expression of a templated class member
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55223 --- Comment #2 from ak at gcc dot gnu.org 2013-01-20 19:03:29 UTC --- Author: ak Date: Sun Jan 20 19:03:22 2013 New Revision: 195321 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195321 Log: libstdc++: Add mem_order_hle_acquire/release to atomic.h v2 The underlying compiler supports additional __ATOMIC_HLE_ACQUIRE/RELEASE memmodel flags for TSX, but this was not exposed to the C++ wrapper. Handle it there. These are additional flags, so some of assert checks need to mask off the flags before checking the memory model type. libstdc++-v3/: 2013-01-12 Andi Kleen Jonathan Wakely PR libstdc++/55223 * include/bits/atomic_base.h (__memory_order_modifier): Add __memory_order_mask, __memory_order_modifier_mask, __memory_order_hle_acquire, __memory_order_hle_release. (operator|,operator&): Add. (__cmpexch_failure_order): Rename to __cmpexch_failure_order2. (__cmpexch_failure_order): Add. (clear, store, load, compare_exchange_weak, compare_exchange_strong): Handle flags. * testsuite/29_atomics/atomic_flag/test_and_set/explicit-hle.cc: Add. Added: trunk/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit-hle.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/atomic_base.h
[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #6 from ak at gcc dot gnu.org --- Author: ak Date: Fri May 12 10:09:50 2017 New Revision: 247962 URL: https://gcc.gnu.org/viewcvs?rev=247962&root=gcc&view=rev Log: Limit perf data buffer during profiling With high -j parallelism the autofdo tests can randomly fail. autofdo uses Linux perf to record profiling data. Linux perf uses a locked perf buffer. By default it has around 516k buffer per uid (/proc/sys/kernel/perf_event_mlock_kb). An individual perf record tries to grab the full 516k, which makes parallel perf record fail. This patch limits the perf buffer for individual perf record to 8k. With the default settings this allows a parallelism of the test cases of 16, which is hopefully good enough (if not would need to add some kind of semaphore, or ask the user to increase the limit as root) I also removed an unneeded -o perf.data option Thanks to Marcin to finally spotting the problem. Passes bootstrap and test on x86_64-linux. Ok for trunk? gcc/testsuite/: 2017-05-12 Andi Kleen PR testsuite/77684 * lib/target-supports.exp (profopt-perf-wrapper): Add -m8 option to increase parallelism. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/target-supports.exp
[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804 --- Comment #11 from ak at gcc dot gnu.org --- Author: ak Date: Tue Nov 11 05:10:58 2014 New Revision: 217336 URL: https://gcc.gnu.org/viewcvs?rev=217336&root=gcc&view=rev Log: Error out for Cilk_spawn or array expression in forbidden places _Cilk_spawn or Cilk array expressions are only allowed on their own, but not in for(), if(), switch, do, while, goto, etc. The C parser didn't always check for that, which lead to ICEs earlier for invalid code. Add a generic helper that checks this and call it where needed in the C frontend. I chose to allow spawn/array for for init and increment expressions. While the Cilk spec could be interpreted to forbid it there too there didn't seem any reason to not allow it. One dark corner is spawn, array in statement expressions not at the end. Right now that's forbidden too. gcc/c-family/: 2014-11-10 Andi Kleen PR c/60804 * c-common.h (check_no_cilk): Declare. * cilk.c (get_error_location): New function. (check_no_cilk): Dito. gcc/c/: 2014-11-10 Andi Kleen PR c/60804 * c-parser.c (c_parser_statement_after_labels): Call check_no_cilk. (c_parser_if_statement): Dito. (c_parser_switch_statement): Dito. (c_parser_while_statement): Dito. (c_parser_do_statement): Dito. (c_parser_for_statement): Dito. * c-typeck.c (c_finish_loop): Dito. Modified: trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.h trunk/gcc/c-family/cilk.c trunk/gcc/c/ChangeLog trunk/gcc/c/c-parser.c trunk/gcc/c/c-typeck.c
[Bug middle-end/60467] ICE with -fcilkplus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60467 ak at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||ak at gcc dot gnu.org Resolution|--- |FIXED --- Comment #5 from ak at gcc dot gnu.org --- Definitely fixed.
[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #9 from ak at gcc dot gnu.org --- This still ICEs with gcc version 5.0.0 20140926 (experimental) (GCC)
[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804 --- Comment #10 from ak at gcc dot gnu.org --- Reduced test case. It's probably invalid cilk, but gcc shouldn't ICE: fn1() { if (_Cilk_spawn func_2()) ; }
[Bug c/61898] Variadic functions accept va_list without warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61898 --- Comment #1 from ak at gcc dot gnu.org --- I agree such a warning would make sense.
[Bug c/63398] New: Cilk errors out incorrectly for spawn inside statement expressions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63398 Bug ID: 63398 Summary: Cilk errors out incorrectly for spawn inside statement expressions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Like in: void f2(); int f() { return ({ _Cilk_spawn f2(); 0; }); } and some other places that use contains_silk_spawn_stmt to check for errors. But that should be legal. The problem is the walk_tree in contains_silk_spawn_statement doesn't stop recursing into the statement.
[Bug tree-optimization/56580] Internal compiler error when trying to compile a sequence of NOPs inside a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56580 ak at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED CC||ak at gcc dot gnu.org Resolution|--- |FIXED --- Comment #5 from ak at gcc dot gnu.org --- Fixed since some time in trunk with 2013-09-08 Andi Kleen * tree-inline.c (estimate_num_insns): Limit asm cost to 1000.
[Bug c++/63472] transaction_atomic within while loop causes ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472 ak at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-10-08 CC||ak at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from ak at gcc dot gnu.org --- Confirmed with trunk. Program received signal SIGSEGV, Segmentation fault. 0x008805b1 in copy_bbs (bbs=0x1e8ecc8, n=9, new_bbs=0x1e8e810, edges=0x0, num_edges=0, new_edges=0x0, base=0x0, after=0x76c3aa90, update_dominance=true) at ../../gcc/gcc/cfghooks.c:1335 1335 if (dom_bb->flags & BB_DUPLICATED) (gdb) p dom_bb->flags Cannot access memory at address 0x50 (gdb)
[Bug c++/63472] transaction_atomic within while loop causes ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472 --- Comment #2 from ak at gcc dot gnu.org --- Looks like there are more problems with -fgnu-tm I hacked csmith to generate random __transaction_atomic blocks and I got a lot of crashes immediately. All I looked at were variants of these two: 0x8e23b7 crash_signal ../../gcc/gcc/toplev.c:340 0x92df5c copy_loops ../../gcc/gcc/tree-inline.c:2379 0x93225c copy_cfg_body ../../gcc/gcc/tree-inline.c:2583 0x93225c copy_body ../../gcc/gcc/tree-inline.c:2777 0x935ab3 tree_function_versioning(tree_node*, tree_node*, vec*, bool, bitmap_head*, bool, bitmap_head*, basic_block_def*) and 0x6d7465 expand_expr_addr_expr_1 ../../gcc/gcc/expr.c:7737 0x6cd9a6 expand_expr_addr_expr ../../gcc/gcc/expr.c:7779 0x6cd9a6 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.c:10604 0x6084f1 expand_normal ../../gcc/gcc/expr.h:457 0x6084f1 precompute_register_parameters ../../gcc/gcc/calls.c:832 0x6084f1 expand_call(tree_node*, rtx_def*, int) ../../gcc/gcc/calls.c:3002 0x5fbeb0 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int) ../../gcc/gcc/builtins.c:6825 0x6cdd95 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.c:10369 0x6d751a store_expr(tree_node*, rtx_def*, int, bool) ../../gcc/gcc/expr.c:5337 0x6dc2d9 expand_assignment(tree_node*, tree_node*, bool)
[Bug c++/63472] transaction_atomic within while loop causes ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472 --- Comment #3 from ak at gcc dot gnu.org --- Another one: 0x8e23b7 crash_signal ../../gcc/gcc/toplev.c:340 0x61be46 copy_bbs(basic_block_def**, unsigned int, basic_block_def**, edge_def**, unsigned int, edge_def**, loop*, basic_block_def*, bool) ../../gcc/gcc/cfghooks.c:1335 0x8eaecf ipa_uninstrument_transaction ../../gcc/gcc/trans-mem.c:4093 0x8eaecf ipa_tm_scan_calls_transaction ../../gcc/gcc/trans-mem.c:4167 0x8eaecf ipa_tm_execute ../../gcc/gcc/trans-mem.c:5340 0x8eaecf execute ../../gcc/gcc/trans-mem.c:5578
[Bug c++/63472] transaction_atomic within while loop causes ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472 --- Comment #4 from ak at gcc dot gnu.org --- Reduced test cases for all three crashes. I suspect multiple have a similar root cause (except perhaps for the expand_expr_addr_expr_1 one) It looks like the transaction code messes up cfgloops. copy_bbs: (illegal code due to goto into transaction?) g_56[]; fn1() { int *p_79; if (g_56[7]) __transaction_atomic { lbl_196: *p_79 = 1; } else goto lbl_196; } expand_expr_addr_expr_1: struct { unsigned : 7; signed f6 : 4 } g_35; safe_rshift_func_uint16_t_u_s() {} func_28() { __transaction_atomic { safe_rshift_func_uint16_t_u_s(g_35.f6); } } copy_loops: func_65() { __transaction_atomic { for (;;) func_65(); } }
[Bug other/43448] gccbug should be removed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43448 --- Comment #1 from ak at gcc dot gnu.org 2010-10-18 09:39:19 UTC --- Author: ak Date: Mon Oct 18 09:39:15 2010 New Revision: 165613 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165613 Log: Remove gccbug gcc/ 2010-10-18 Andi Kleen PR other/43448 * gccbug.in: Remove. * Makefile.in (GCCBUG_INSTALL_NAME, gccbug): Remove (doc, distclean, install-common): Remove reference to gccbug. * configure: Regenerate. * configure.ac (all_outputs): Remove gccbug. * doc/configfiles.texi: Remove references to gccbug. * doc/sourcebuild.texi: Dito. contrib/ 2010-10-18 Andi Kleen * gccbug.el: Remove. Removed: trunk/contrib/gccbug.el trunk/gcc/gccbug.in Modified: trunk/contrib/ChangeLog trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/configure trunk/gcc/configure.ac trunk/gcc/doc/configfiles.texi trunk/gcc/doc/sourcebuild.texi
[Bug tree-optimization/36602] memset should be optimized into an empty CONSTRUCTOR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36602 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org, ||mjambor at suse dot cz --- Comment #3 from ak at gcc dot gnu.org 2011-06-22 20:30:56 UTC --- I ran into a similar problem in my code. It would be nice if memset didn't break SRA.
[Bug target/93768] Use vpternlog for composite logical operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93768 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from ak at gcc dot gnu.org --- Most of it is already done as part of PR101989 One issue is that it is only for suitable vector types, it doesn't really work for scalars because the compiler has no idea that a conversion might be profitable. Perhaps that would be an interesting (but likely) separate feature to define some frame work to figure out if switching to the vector ISA is worth it. *** This bug has been marked as a duplicate of bug 101989 ***
[Bug target/101989] Fail to optimize (a & b) | (c & ~b) to vpternlog instruction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101989 ak at gcc dot gnu.org changed: What|Removed |Added CC||rth at gcc dot gnu.org --- Comment #8 from ak at gcc dot gnu.org --- *** Bug 93768 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/115866] missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 ak at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED CC||ak at gcc dot gnu.org --- Comment #6 from ak at gcc dot gnu.org --- Change checked in
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 115866, which changed state. Bug 115866 Summary: missed optimization vectorizing switch statements. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/115130] [meta-bug] early break vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug 115130 depends on bug 115866, which changed state. Bug 115866 Summary: missed optimization vectorizing switch statements. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug testsuite/116080] [15 regression] New tests from r15-2233-g8d1af8f904a0c0 fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080 ak at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED CC||ak at gcc dot gnu.org --- Comment #12 from ak at gcc dot gnu.org --- Patch checked in
[Bug tree-optimization/116520] Multiple condition lead to missing vectorization due to missing early break
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520 ak at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|WAITING Resolution|DUPLICATE |--- --- Comment #6 from ak at gcc dot gnu.org --- No this is not a dup. This bug is about early break. The other bug is about switch.
[Bug tree-optimization/115866] missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 ak at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #10 from ak at gcc dot gnu.org --- It is still fixed.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 115866, which changed state. Bug 115866 Summary: missed optimization vectorizing switch statements. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 116520, which changed state. Bug 116520 Summary: Multiple condition lead to missing vectorization due to missing early break https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520 What|Removed |Added Status|RESOLVED|WAITING Resolution|DUPLICATE |---
[Bug tree-optimization/115130] [meta-bug] early break vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug 115130 depends on bug 116520, which changed state. Bug 116520 Summary: Multiple condition lead to missing vectorization due to missing early break https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520 What|Removed |Added Status|RESOLVED|WAITING Resolution|DUPLICATE |---
[Bug tree-optimization/115130] [meta-bug] early break vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug 115130 depends on bug 115866, which changed state. Bug 115866 Summary: missed optimization vectorizing switch statements. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED
[Bug preprocessor/79465] infinite #include cycle is not detected
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79465 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org Resolution|DUPLICATE |--- Last reconfirmed||2024-06-26 Ever confirmed|0 |1 Status|RESOLVED|NEW
[Bug c/115704] New: -Wstringop-overread and related warnings should print inline stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115704 Bug ID: 115704 Summary: -Wstringop-overread and related warnings should print inline stack Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- Forked from PR115274 Since they often depend on inlining and the exact caller, and for the user to determine if they read or not they need to know that.
[Bug tree-optimization/115274] Bogus -Wstringop-overread in SQLite source code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115274 ak at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2024-06-29 CC||ak at gcc dot gnu.org --- Comment #9 from ak at gcc dot gnu.org --- creduce minimized it to #include char *c; void a(); int b(char *d) { return strlen(d); } void e() { long f = 1; f = b(c + f); if (c == 0) a(f); } >From the one it seems to be invalid because the c global is indeed NULL. but it's hard to say if it is exactly equivalent because it will depend on the caller and the original test case had something like 30+ callers, so we don't know the exact context. Problem is that these warnings which depend on inlining should really print the inline stack for the instance that triggers the warning. I opened PR115704
[Bug c++/115728] Feature Request: inline assembly improvements for C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115728 ak at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED CC||ak at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #3 from ak at gcc dot gnu.org --- The constexpr asm support is in trunk. It supports templates. >The second is I want finer grain control over marking memory regions as >needing >to be updated before inline assembly code is executed, or invalidated >after. You can do that by specifying the memory region to be updated in the input/output list
[Bug c/83324] [feature request] Pragma or special syntax for guaranteed tail calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83324 ak at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED CC||ak at gcc dot gnu.org Status|ASSIGNED|RESOLVED --- Comment #27 from ak at gcc dot gnu.org --- Implemented in trunk in a mostly LLVM compatible way. There are some remaining open issues (PR116019, PR115979, PR115606, PR115607) , but none should be show stoppers. There are some differences to clang, mainly that gcc handles a few cases that clang doesn't, but clang handles more cases with -O0. The success also depends on the architecture and the languages (C is better than C++ due to PR115606)
[Bug middle-end/116510] [15 Regression] ice in decompose, at wide-int.h:1049
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116510 --- Comment #12 from ak at gcc dot gnu.org --- Like this? It fixes the test case. I'm not sure why you want AND_EXPR, this is a truth formula. Maybe it should be TRUTH_ANDIF_EXPR though to short circuit. diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 90c754a48147..376a4642954d 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -1477,10 +1477,12 @@ predicate_bbs (loop_p loop) { tree low = build2_loc (loc, GE_EXPR, boolean_type_node, -index, CASE_LOW (label)); +index, fold_convert_loc (loc, TREE_TYPE (index), +CASE_LOW (label))); tree high = build2_loc (loc, LE_EXPR, boolean_type_node, - index, CASE_HIGH (label)); + index, fold_convert_loc (loc, TREE_TYPE (index), + CASE_HIGH (label))); case_cond = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node, low, high); @@ -1489,7 +1491,8 @@ predicate_bbs (loop_p loop) case_cond = build2_loc (loc, EQ_EXPR, boolean_type_node, index, - CASE_LOW (gimple_switch_label (sw, i))); + fold_convert_loc (loc, TREE_TYPE (index), + CASE_LOW (label))); if (i > 1) switch_cond = build2_loc (loc, TRUTH_OR_EXPR, boolean_type_node,
[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 ak at gcc dot gnu.org changed: What|Removed |Added Summary|bit_test_cluster takes |switch clustering takes |extensive time with large |extensive time with large |switches even at -O0|switches even at -O0 --- Comment #3 from ak at gcc dot gnu.org --- With -fno-bit-tests -fno-jump-tables it compiles reasonably fast. One bug is really that these two options are enabled by default even at -O0. tree-switch-conversion has some logic for this, but it seems to be broken. Second step would be to figure out how to improve the clustering algorithm scaling.
[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 --- Comment #4 from ak at gcc dot gnu.org --- Here's a patch that enables the slow switch conversions only at -O2. With that the test case builds reasonably quickly. diff --git a/gcc/common.opt b/gcc/common.opt index 12b25ff486de..4af7a94fea42 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2189,11 +2189,11 @@ Common Var(flag_ivopts) Init(1) Optimization Optimize induction variables on trees. fjump-tables -Common Var(flag_jump_tables) Init(1) Optimization +Common Var(flag_jump_tables) Init(-1) Optimization Use jump tables for sufficiently large switch statements. fbit-tests -Common Var(flag_bit_tests) Init(1) Optimization +Common Var(flag_bit_tests) Init(-1) Optimization Use bit tests for sufficiently large switch statements. fkeep-inline-functions diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h index 6468995eb316..1cca23671d70 100644 --- a/gcc/tree-switch-conversion.h +++ b/gcc/tree-switch-conversion.h @@ -442,7 +442,7 @@ public: /* Return whether bit test expansion is allowed. */ static inline bool is_enabled (void) { -return flag_bit_tests; +return flag_bit_tests >= 0 ? flag_bit_tests : (optimize > 1); } /* True when the jump table handles an entire switch statement. */ @@ -524,7 +524,8 @@ bool jump_table_cluster::is_enabled (void) over-ruled us, we really have no choice. */ if (!targetm.have_casesi () && !targetm.have_tablejump ()) return false; - if (!flag_jump_tables) + int flag = flag_jump_tables >= 0 ? flag_jump_tables : (optimize > 1); + if (!flag) return false; #ifndef ASM_OUTPUT_ADDR_DIFF_ELT if (flag_pic)
[Bug middle-end/117091] bit_test_cluster takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 ak at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed||2024-10-11 Summary|compile time Regression in |bit_test_cluster takes |GCC Trunk vs GCC 6.1|extensive time with large ||switches even at -O0 CC||ak at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from ak at gcc dot gnu.org --- Problem seems to be in the bit test cluster detection 87.71% cc1[.] tree_switch_conversion::bit_test_cluster::can_be_handled(vec const&, 5.73% cc1[.] tree_switch_conversion::bit_test_cluster::find_bit_tests(vec&) 4.78% cc1[.] tree_switch_conversion::bit_test_cluster::can_be_handled(unsigned long, unsigned int) Perhaps the bit_test_cluster check should depend on -O2, or need some limit.
[Bug middle-end/117091] bit_test_cluster takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 --- Comment #2 from ak at gcc dot gnu.org --- Minimum patch. Only enable the clustering at -O2. diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc index 00426d46..468b15f1c461 100644 --- a/gcc/tree-switch-conversion.cc +++ b/gcc/tree-switch-conversion.cc @@ -1375,7 +1375,8 @@ switch_conversion::expand (gswitch *swtch) gcc_checking_assert (TREE_TYPE (m_index_expr) != error_mark_node); /* Prefer bit test if possible. */ - if (tree_fits_uhwi_p (m_range_size) + if (optimize >= 2 + && tree_fits_uhwi_p (m_range_size) && bit_test_cluster::can_be_handled (tree_to_uhwi (m_range_size), m_uniq) && bit_test_cluster::is_beneficial (m_count, m_uniq)) {
[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 ak at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #11 from ak at gcc dot gnu.org --- Adding RichardS for the late-combine issue.
[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 --- Comment #10 from ak at gcc dot gnu.org --- https://github.com/andikleen/gcc/commit/9a71a4dbdd7094241bcdb0b89d7261c19dcc4b34 fixes the test case by checking early that bit clustering only works when multiple labels point to the same code. It still needs a limit on the clusters however. With that fixes the test case shows a new issue in late combine undoing and redoing things constantly. Is that a known problem? - 35.83% cc1 cc1 [.] temporarily_undo_changes(int) ◆ 35.82% temporarily_undo_changes(int) ▒ rtl_ssa::insn_info::calculate_cost() const ▒ rtl_ssa::changes_are_worthwhile(array_slice, bool) ▒ (anonymous namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*, rtl_ssa::insn_info*) ▒ (anonymous namespace)::pass_late_combine::execute(function*) ▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ + execute_pass_list_1(opt_pass*) ▒ - 34.44% cc1 cc1 [.] redo_changes(int) ▒ 34.43% rtl_ssa::insn_info::calculate_cost() const ▒ rtl_ssa::changes_are_worthwhile(array_slice, bool) ▒ (anonymous namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*, rtl_ssa::insn_info*) ▒ (anonymous namespace)::pass_late_combine::execute(function*) ▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ + execute_pass_list_1(opt_pass*) ▒
[Bug target/117312] RFE: x86 (and perhaps others): inline assembly: "red-zone" clobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117312 --- Comment #5 from ak at gcc dot gnu.org --- Peter, can you construct a test case that demonstrates the problem?
[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 ak at gcc dot gnu.org changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED|REOPENED --- Comment #20 from ak at gcc dot gnu.org --- Reopen because the patch with the new algorithm has been reverted due to PR117352 It doesn't take range comparisons into account, and probably needs to understand CCMP
[Bug c++/117351] New: ICE while reporting invalid template error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117351 Bug ID: 117351 Summary: ICE while reporting invalid template error Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- While trying to reduce another problem I hit this: foo.cc: template <_Lp> struct __shared_ptr_access { template using __esft_base_t decltype(__enable_shared_from_this_base()); template __esft_base_t<_Yp> cc1plus foo.cc gu.cc:1:11: error: ‘_Lp’ has not been declared 1 | template <_Lp> struct __shared_ptr_access { | ^~~ gu.cc:3:23: error: expected ‘=’ before ‘decltype’ [-Wtemplate-body] 3 | using __esft_base_t decltype(__enable_shared_from_this_base()); | ^~~~ gu.cc:3:32: error: there are no arguments to ‘__enable_shared_from_this_base’ that depend on a template parameter, so a declaration of ‘__enable_shared_from_this_base’ must be available [-Wtemplate-body] 3 | using __esft_base_t decltype(__enable_shared_from_this_base()); |^~ gu.cc:3:32: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated) gu.cc:3:32: error: there are no arguments to ‘__enable_shared_from_this_base’ that depend on a template parameter, so a declaration of ‘__enable_shared_from_this_base’ must be available [-Wtemplate-body] gu.cc: In substitution of ‘template< > template using __shared_ptr_access< >::__esft_base_t = decltype (__enable_shared_from_this_base()) [with = _Yp; = ]’: gu.cc:4:44: required from here 4 | template __esft_base_t<_Yp> |^ gu.cc:3:62: error: ‘__enable_shared_from_this_base’ was not declared in this scope 3 | using __esft_base_t decltype(__enable_shared_from_this_base()); |~~^~ gu.cc:4:44: internal compiler error: Segmentation fault 4 | template __esft_base_t<_Yp> |^ 0x27adb7f internal_error(char const*, ...) ../../gcc/gcc/diagnostic-global-context.cc:518 0x135ed66 crash_signal ../../gcc/gcc/toplev.cc:323 0x9fe1b7 tree_class_check(tree_node*, tree_code_class, char const*, int, char const*) ../../gcc/gcc/tree.h:3797 0x9fe1b7 pop_nested_class() ../../gcc/gcc/cp/class.cc:8636 0xc322d9 instantiate_template(tree_node*, tree_node*, int) ../../gcc/gcc/cp/pt.cc:22304 0xc33c8a instantiate_alias_template(tree_node*, tree_node*, int) [clone .part.0] [clone .lto_priv.0] ../../gcc/gcc/cp/pt.cc:22400 0xc07640 instantiate_alias_template ../../gcc/gcc/cp/pt.cc:22378 0xc07640 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*, int) ../../gcc/gcc/cp/pt.cc:10177 0xc4c037 finish_template_type(tree_node*, tree_node*, int) ../../gcc/gcc/cp/semantics.cc:4144 0xb92ced cp_parser_template_id ../../gcc/gcc/cp/parser.cc:19301 0xb92f65 cp_parser_class_name ../../gcc/gcc/cp/parser.cc:26973 0xb97d12 cp_parser_qualifying_entity ../../gcc/gcc/cp/parser.cc:7438 0xb97d12 cp_parser_nested_name_specifier_opt ../../gcc/gcc/cp/parser.cc:7124 0xbacdb4 cp_parser_constructor_declarator_p ../../gcc/gcc/cp/parser.cc:32800 0xbacdb4 cp_parser_decl_specifier_seq ../../gcc/gcc/cp/parser.cc:16872 0xbb7068 cp_parser_single_declaration ../../gcc/gcc/cp/parser.cc:33467 0xbb7662 cp_parser_template_declaration_after_parameters ../../gcc/gcc/cp/parser.cc:33228 0xbb7662 cp_parser_explicit_template_declaration ../../gcc/gcc/cp/parser.cc:33398 0xb86e10 cp_parser_member_specification_opt ../../gcc/gcc/cp/parser.cc:28187 0xb86e10 cp_parser_class_specifier ../../gcc/gcc/cp/parser.cc:27166 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug bootstrap/117350] ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #3 from ak at gcc dot gnu.org --- Reduced test case for an Intel platform: gu.cc: template class tuple; template struct tuple<_T1, _T2> { tuple(_T1, _T2); }; struct __uniq_ptr_impl { __uniq_ptr_impl(int __p, int) : _M_t(__p, int()) {} tuple _M_t; }; struct __uniq_ptr_data : __uniq_ptr_impl { __uniq_ptr_impl::__uniq_ptr_impl; }; template struct unique_ptr { __uniq_ptr_data _M_t; template unique_ptr(unique_ptr<_Up, _Ep>) : _M_t(0, _Ep()) {} }; template unique_ptr make_unique(); namespace { struct gcc_urlifier; } unique_ptr make_gcc_urlifier() { return make_unique(); } perf record -c 10003 -b -e br_inst_retired.all_branches:pu ./cc1plus -O2 -flto=jobserver gu.cc create_gcov -binary cc1plus --gcov_version 2 cc1plus -O2 -flto=jobserver -fauto-profile=fbdata.afdo gu.cc (note may need to change the perf event name on other CPUs, see perf list branch)
[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091 ak at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #19 from ak at gcc dot gnu.org --- Fixed for the switch part. There is still a problem with late combine, this is tracked in PR117297
[Bug middle-end/117352] New: switch bit test conversion makes comparison code worse
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117352 Bug ID: 117352 Summary: switch bit test conversion makes comparison code worse Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- With the change in PR117091 that makes switch bit test conversion more aggressive I see a failure in gcc.dg/pr21643.c which checks for tree reassoc happening. I fixed the the test by using -fno-bit-tests. However this makes the generated code on aarch64 worse: int f1 (unsigned char c) { if (c == 0x22 || c == 0x20 || c < 0x20) return 1; return 0; } Before (with -fno-bit-tests or without PR117091 change) f1: .LFB0: and w0, w0, 255 mov w1, 32 cmp w0, 34 ccmpw0, w1, 0, ne csetw0, ls ret After: f1: .LFB0: and w0, w0, 255 mov x1, -281449206906881 movkx1, 0x0, lsl 48 cmp w0, 35 lsr x0, x1, x0 and w0, w0, 1 cselw0, w0, wzr, cc ret So I guess tree-reassoc needs to learn to handle bit test switch code better?
[Bug bootstrap/117350] ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #5 from ak at gcc dot gnu.org --- Also the ICE had a truncated backtrace. Checking it in gdb gives the full one. The bad mangling happens while autofdo is reading the string table of the afdo file, and trying to generate the asm name of an arbitrary decl. #0 write_unscoped_name (decl=) at ../../gcc/gcc/cp/mangle.cc:1197 #1 0x00b23797 in write_unscoped_template_name (decl=) at ../../gcc/gcc/cp/mangle.cc:1215 #2 write_name (decl=, ignore_local_scope=) at ../../gcc/gcc/cp/mangle.cc:1122 #3 0x00b24a2a in write_encoding (decl=) at ../../gcc/gcc/cp/mangle.cc:939 #4 0x00b24c0a in write_mangled_name (decl=decl@entry=, top_level=top_level@entry=true) at ../../gcc/gcc/cp/mangle.cc:821 #5 0x00b2cba4 in mangle_decl_string (decl=decl@entry=) at ../../gcc/gcc/cp/mangle.cc:4428 #6 0x00b2cda8 in get_mangled_id (decl=) at ../../gcc/gcc/cp/mangle.cc:4449 #7 mangle_decl (decl=) at ../../gcc/gcc/cp/mangle.cc:4487 #8 0x016ad32b in decl_assembler_name (decl=) at ../../gcc/gcc/tree.cc:728 #9 0x024e1007 in autofdo::string_table::get_index_by_decl (this=0x34c0b9c0, decl=)
[Bug bootstrap/117350] ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 ak at gcc dot gnu.org changed: What|Removed |Added Ever confirmed|1 |0 Status|WAITING |UNCONFIRMED CC||jason at redhat dot com --- Comment #4 from ak at gcc dot gnu.org --- This is the originally failing assert 1194 /* If not, it should be either in the global namespace, or directly 1195 in a local function scope. A lambda can also be mangled in the 1196 scope of a default argument. */ 1197 gcc_assert (context == global_namespace 1198 || TREE_CODE (context) == PARM_DECL 1199 || TREE_CODE (context) == FUNCTION_DECL); context is constant 8> unit-size constant 1> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7f17c25dd930 fields unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7f17c25d42a0 fields context pointer_to_this reference_to_this > used nonlocal decl_3 QI gu.cc:13:19 size unit-size align:8 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0 offset bit-offset context > context pointer_to_this reference_to_this >
[Bug other/117350] New: ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 Bug ID: 117350 Summary: ICE in pretty print during bootstrap Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- With --with-build-config=bootstrap-lto make autoprofiledbootstrap I get /home/ak/gcc/obj-auto/./prev-gcc/xg++ -B/home/ak/gcc/obj-auto/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/home/ak/gcc/obj-auto/prev-x86_64- pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/home/ak/gcc/gcc/libstdc+ +-v3/libsupc++ -L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fchecking=1 -flto=jobserver -frandom-seed=1 -DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-error=narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-var iadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -fno-PIE -fauto-profile=cc1plus.fda -fauto-profile=cc1plus.fda -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc /gcc/../libcody -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/gcc/../libbacktrace -o gcc-urlifier.o -MT gcc-urlifier.o -MMD -MP -MF ./.deps/gcc-urlifier.TPo ../../gcc/gcc/gcc-ur lifier.cc during GIMPLE pass: einline In file included from /home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:37, from /home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/memory:80, from ../../gcc/gcc/system.h:766, from ../../gcc/gcc/gcc-urlifier.cc:23: in format_phase_2, at pretty-print.cc:2162 2088 | tuple() | ^ 0x27adb7f internal_error(char const*, ...) ../../gcc/gcc/diagnostic-global-context.cc:518 0x9ba31b fancy_abort(char const*, int, char const*) ../../gcc/gcc/diagnostic.cc:1580 0x9bd456 format_phase_2 ../../gcc/gcc/pretty-print.cc:2162 0x9bd456 pretty_printer::format(text_info&) ../../gcc/gcc/pretty-print.cc:1712 0x282b62e pp_format(pretty_printer*, text_info*) ../../gcc/gcc/pretty-print.h:602 0x282b62e pp_format_verbatim(pretty_printer*, text_info*) ../../gcc/gcc/pretty-print.cc:2340 0x282b62e pp_verbatim(pretty_printer*, char const*, ...) ../../gcc/gcc/pretty-print.cc:2619 0xadef66 print_instantiation_full_context ../../gcc/gcc/cp/error.cc:3855 0xadef66 maybe_print_instantiation_context ../../gcc/gcc/cp/error.cc:4010 0xadef66 maybe_print_instantiation_context ../../gcc/gcc/cp/error.cc:4004 0x13c5e59 default_tree_diagnostic_text_starter ../../gcc/gcc/tree-diagnostic.cc:52 0x27ab84f diagnostic_text_output_format::on_report_diagnostic(diagnostic_info const&, diagnostic_t) ../../gcc/gcc/diagnostic-format-text.cc:207 0x27acf66 diagnostic_context::report_diagnostic(diagnostic_info*) ../../gcc/gcc/diagnostic.cc:1357 0x27ad2ce diagnostic_context::diagnostic_impl(rich_location*, diagnostic_metadata const*, diagnostic_option_id, char const*, __va_list_tag (*) [1], diagnostic_t) ../../gcc/gcc/diagnostic.cc:1472 0x27adb7f internal_error(char const*, ...) ../../gcc/gcc/diagnostic-global-context.cc:518 0x9ba31b fancy_abort(char const*, int, char const*) ../../gcc/gcc/diagnostic.cc:1580 0x74fbd5 write_unscoped_name ../../gcc/gcc/cp/mangle.cc:1197 0xb23796 write_unscoped_template_name ../../gcc/gcc/cp/mangle.cc:1215 0xb23796 write_name ../../gcc/gcc/cp/mangle.cc:1122 0xb24a29 write_encoding ../../gcc/gcc/cp/mangle.cc:939 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report.
[Bug bootstrap/117350] ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #8 from ak at gcc dot gnu.org --- It's when reading the profile file, so stage 4 (?) The full log is here: http://firstfloor.org/~andi/l2
[Bug bootstrap/117350] ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #10 from ak at gcc dot gnu.org --- The small test case also fails with gcc 13.0 (although it doesn't have the nested ICE). So it's an old latent bug. gcc version 13.3.1 20240913 (Red Hat 13.3.1-3) (GCC) gcc -fauto-profile=fbdata.afdo gu.cc -O2 -flto gu.cc:10:3: warning: access declarations are deprecated in favour of using-declarations; suggestion: add the ‘using’ keyword [-Wdeprecated] 10 | __uniq_ptr_impl::__uniq_ptr_impl; | ^~~ gu.cc:17:37: warning: ‘unique_ptr make_unique() [with T = {anonymous}::gcc_urlifier]’ used but never defined 17 | template unique_ptr make_unique(); | ^~~ during GIMPLE pass: einline ‘ in pp_format, at pretty-print.cc:1478 15 | unique_ptr(unique_ptr<_Up, _Ep>) : _M_t(0, _Ep()) {} | ^~ Please submit a full bug report, with preprocessed source. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/cc3ZB46l.out file, please attach this to your bugreport.
[Bug bootstrap/117350] ICE in pretty print during bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #11 from ak at gcc dot gnu.org --- Given that it reproduce with distribution gcc 13.0 I don't think it's a miscompilation.
[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #15 from ak at gcc dot gnu.org --- I guess to debug have to figure what's different about the decl between the non autofdo case and autofdo. I tried to work around it by modifying the urlifier code to avoid the anonymous name space, but it hits a similar bug later in gimple-range-fold.cc. Here is a full build log of that attempt: http://firstfloor.org/~andi/l /home/ak/gcc/obj-auto/./prev-gcc/xg++ -B/home/ak/gcc/obj-auto/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/home/a k/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc+ +/.libs -I/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/home/ak/gcc/obj-auto/prev-x86_ 64-pc-linux-gnu/libstdc++-v3/include -I/home/ak/gcc/gcc/libstdc++-v3/libsupc++ -L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libs tdc++-v3/src/.libs -L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fchecking=1 - flto=jobserver -frandom-seed=1 -DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-error=narrowing -Wwri te-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variad ic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -fno-PIE -fauto-profile=cc1plus.fda -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc /../libdecnumber/bid -I../libdecnumber -I../../gcc/gcc/../libbacktrace -o gimple-range-fold.o -MT gimple-range-fold.o -MMD -MP -MF ./.deps/gimple-range-fold.TPo ../../gcc/gcc/gimple-range-fold.cc 0x27a4b3f internal_error(char const*, ...) ../../gcc/gcc/diagnostic-global-context.cc:518 0x9bb2a5 fancy_abort(char const*, int, char const*) ../../gcc/gcc/diagnostic.cc:1693 0x9bdd90 format_phase_2 ../../gcc/gcc/pretty-print.cc:2162 0x9bdd90 pretty_printer::format(text_info&) ../../gcc/gcc/pretty-print.cc:1712 0x282925e pp_format(pretty_printer*, text_info*) ../../gcc/gcc/pretty-print.h:602 0x282925e pp_format_verbatim(pretty_printer*, text_info*) ../../gcc/gcc/pretty-print.cc:2340 0x282925e pp_verbatim(pretty_printer*, char const*, ...) ../../gcc/gcc/pretty-print.cc:2619 0xae1ea3 print_instantiation_full_context ../../gcc/gcc/cp/error.cc:3807 0xae1ea3 maybe_print_instantiation_context ../../gcc/gcc/cp/error.cc:3962 0xae1ea3 maybe_print_instantiation_context ../../gcc/gcc/cp/error.cc:3956 0x13adc56 default_tree_diagnostic_text_starter ../../gcc/gcc/tree-diagnostic.cc:52 0x27a2ba0 diagnostic_text_output_format::on_report_diagnostic(diagnostic_info const&, diagnostic_t) ../../gcc/gcc/diagnostic-format-text.cc:210 0x27a3f62 diagnostic_context::report_diagnostic(diagnostic_info*) 0x27a434e diagnostic_context::diagnostic_impl(rich_location*, diagnostic_metadata const*, diagnostic_option_id, char const*, __va_list_tag (*) [1], diagnostic_t) ../../gcc/gcc/diagnostic.cc:1585 0x27a4b3f internal_error(char const*, ...) ../../gcc/gcc/diagnostic-global-context.cc:518 0x9bb2a5 fancy_abort(char const*, int, char const*) ../../gcc/gcc/diagnostic.cc:1693 0x750805 write_unscoped_name ../../gcc/gcc/cp/mangle.cc:1197 0xb22496 write_unscoped_template_name ../../gcc/gcc/cp/mangle.cc:1215 0xb22496 write_name ../../gcc/gcc/cp/mangle.cc:1122 0xb23729 write_encoding ../../gcc/gcc/cp/mangle.cc:939 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #21 from ak at gcc dot gnu.org --- Thanks. I'll see if this patch is enough: diff --git a/gcc/tree.cc b/gcc/tree.cc index b4c059d3b0db..92f99eaccd72 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -787,8 +787,9 @@ need_assembler_name_p (tree decl) || DECL_ASSEMBLER_NAME_SET_P (decl)) return false; - /* Abstract decls do not need an assembler name. */ - if (DECL_ABSTRACT_P (decl)) + /* Abstract decls do not need an assembler name, except they + can be looked up by autofdo. */ + if (DECL_ABSTRACT_P (decl) && !flag_auto_profile) return false; /* For VAR_DECLs, only static, public and external symbols need an
[Bug target/117312] RFE: x86 (and perhaps others): inline assembly: "red-zone" clobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117312 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #3 from ak at gcc dot gnu.org --- This must be hit in lots of application code using inline asm? I wonder why noone complained.
[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #17 from ak at gcc dot gnu.org --- http://firstfloor.org/~andi/fbdata.afdo is the gcov file for the reproducer above.
[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #16 from ak at gcc dot gnu.org --- I'm not sure the revision in the subject is right. Given the reproduction in gcc 13 it seems to me this is a latent bug that is just triggered by changes in the bootstrapped input source. Strangely it is now triggered by at least two places, so something else might have changed. The initial failure comes from this assert failing 1194 /* If not, it should be either in the global namespace, or directly 1195 in a local function scope. A lambda can also be mangled in the 1196 scope of a default argument. */ 1197 gcc_assert (context == global_namespace 1198 || TREE_CODE (context) == PARM_DECL 1199 || TREE_CODE (context) == FUNCTION_DECL); When i look at it in rr I see (rr) p context $5 = (rr) p decl $7 = This doesn't look like garbage from freed data, more some logic problem.
[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 --- Comment #18 from ak at gcc dot gnu.org --- Okay I looked into need_assembler_name_p. For __ct function_decl it bails out due to 784 /* If DECL already has its assembler name set, it does not need a 785 new one. */ 786 if (!HAS_DECL_ASSEMBLER_NAME_P (decl) 787 || DECL_ASSEMBLER_NAME_SET_P (decl)) 788 return false; > QI size unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fc0883d0930 method basetype arg-types chain chain chain >>>> pointer_to_this > used public abstract external QI ../gu.cc:3:3 align:16 warn_if_not_align:0 context abstract_origin full-name "tuple<_T1, _T2>::tuple(_T1, _T2) [with _T1 = int; _T2 = int]" template-info VOID ../gu.cc:3:3 align:1 warn_if_not_align:0 context result parms value length:2 elt:0 > elt:1 >>> full-name "template tuple<_T1, _T2>::tuple(_T1, _T2)"> args elt:1 > pending_template> use_template=1 chain > I assume that means HAS_DECL_ASSEMBLER_NAME_P returns false.
[Bug rtl-optimization/117297] New: late combine undoes too much
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117297 Bug ID: 117297 Summary: late combine undoes too much Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- forked from PR117091 When the (admittedly extreme) test case from PR117091 is compiled with -O2 -fno-bit-tests -fno-jump-tables (to work around the switch scalability issues) the compiler spends ~70% of the time in late combine doing - 39.85% cc1 cc1 [.] temporarily_undo_changes(int) ◆ 39.84% temporarily_undo_changes(int) ▒ rtl_ssa::insn_info::calculate_cost() const ▒ - rtl_ssa::changes_are_worthwhile(array_slice, bool) ▒ - 31.12% (anonymous namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*, rtl_ssa::insn_info*) ▒ (anonymous namespace)::pass_late_combine::execute(function*) ▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ - 8.72% (anonymous namespace)::pass_late_combine::execute(function*) ▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ - 32.56% cc1 cc1 [.] redo_changes(int) ▒ 32.56% rtl_ssa::insn_info::calculate_cost() const ▒ - rtl_ssa::changes_are_worthwhile(array_slice, bool) ▒ - 25.50% (anonymous namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*, rtl_ssa::insn_info*) ▒ (anonymous namespace)::pass_late_combine::execute(function*) ▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ - 7.06% (anonymous namespace)::pass_late_combine::execute(function*) ▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒ execute_pass_list_1(opt_pass*) ▒
[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350 ak at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #25 from ak at gcc dot gnu.org --- Patch is committed
[Bug c++/118277] g++ ICEs with depedent inline-asm string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118277 ak at gcc dot gnu.org changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #4 from ak at gcc dot gnu.org --- Most likely it's latent, asm constexpr just reuses the existing constexpr machinery. 5271static tree 5272initialized_type (tree t) 5273{ 5274 if (TYPE_P (t)) 5275return t; 5276 tree type = TREE_TYPE (t); 5277 if (TREE_CODE (t) == CALL_EXPR) 5278{ (rr) 5279 /* A constructor call has void type, so we need to look deeper. */ 5280 tree fn = get_function_named_in_call (t); 5281 if (fn && TREE_CODE (fn) == FUNCTION_DECL 5282 && DECL_CXX_CONSTRUCTOR_P (fn)) 5283type = DECL_CONTEXT (fn); 5284} 5285 else if (TREE_CODE (t) == COMPOUND_EXPR) 5286return initialized_type (TREE_OPERAND (t, 1)); 5287 else if (TREE_CODE (t) == AGGR_INIT_EXPR) 5288type = TREE_TYPE (AGGR_INIT_EXPR_SLOT (t)); (rr) 5289 return cv_unqualified (type); 5290} but t is an unexpected scope_ref with no type which is not handled: (rr) pt t template-info args readonly constant decl index 0 level 1 orig_level 1>>> full-name "struct to_str" no-binfo use_template=1 interface-unknown chain > arg:1 > ../../tsrc/conste.cc:5:23 start: ../../tsrc/conste.cc:5:23 finish: ../../tsrc/conste.cc:5:27>
[Bug tree-optimization/118279] gcc fails to eliminate unnecessary guards around switch()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118279 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org, ||amacleod at redhat dot com --- Comment #3 from ak at gcc dot gnu.org --- Seems like a ranger issue?
[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168 ak at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED CC||ak at gcc dot gnu.org
[Bug testsuite/117961] x86 testsuite: scan-assembler[-not] is bogus for inline asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117961 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #7 from ak at gcc dot gnu.org --- i suppose scan-assembler could just ignore lines starting with #
[Bug tree-optimization/118443] New: [Meta bug] Bugs triggered by and blocking more smtgcc testing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118443 Bug ID: 118443 Summary: [Meta bug] Bugs triggered by and blocking more smtgcc testing Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Depends on: 113703, 117186, 117688, 118174 Target Milestone: --- Optimizations introducing undefined behavior. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 [Bug 113703] ivopts miscompiles loop https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117186 [Bug 117186] [12/13/14 Regression] aarch64 wrong code for (a < b) < (b < a) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117688 [Bug 117688] [15 Regression] RISC-V: Wrong code for .SAT_SUB https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118174 [Bug 118174] [15 Regression] AArch64: Miscompilation at -O3 since r15-5943-gdc0dea98c96e02
[Bug translation/40883] [meta-bug] Translation breakage with trivial fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883 Bug 40883 depends on bug 80188, which changed state. Bug 80188 Summary: calls.c: reason argument to maybe_complain_about_tail_call must be marked for translation https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80188 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/118444] New: [Meta bug] musttail bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118444 Bug ID: 118444 Summary: [Meta bug] musttail bugs Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Depends on: 115606, 115979, 116080, 116545, 118430, 118442 Target Milestone: --- Issues with the new musttail attribute. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115606 [Bug 115606] C++ front-end marks the return slot as addressable early on which prevents tail call being marked https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115979 [Bug 115979] Implicitly generated C++ calls stop musttail search early https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080 [Bug 116080] [15 regression] New tests from r15-2233-g8d1af8f904a0c0 fail https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116545 [Bug 116545] Support old style statement attributes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430 [Bug 118430] [14/15 Regression] tail call vs IPA-VRP return value range with constant value https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118442 [Bug 118442] -fprofile-generate wrongly adds instrumentation after musttail call
[Bug translation/80188] calls.c: reason argument to maybe_complain_about_tail_call must be marked for translation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80188 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from ak at gcc dot gnu.org --- This has been fixed in gcc 15.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #10 from ak at gcc dot gnu.org --- Okay it looks like the test case just avoids the if (...) return problem by replacing it with if (...) break. I guess the vectorizer should really be able to do that on its own.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #9 from ak at gcc dot gnu.org --- On x86/avx512f the first variant still fails with earch-line-fast.c:4:60: missed: couldn't vectorize loop search-line-fast.c:4:60: missed: not vectorized: number of iterations cannot be computed. and the second variant with end condition with search-line-fast-cond.c:3:18: missed: couldn't vectorize loop search-line-fast-cond.c:3:18: missed: not vectorized: unsupported control flow in loop. search-line-fast-cond.c:1:22: note: vectorized 0 loops in function. The first needs some pattern matching: having the break condition in the loop vs having it in a while header shouldn't matter. I think the later is due to vect_analyze_loop_form: |if (EDGE_COUNT (bbs[i]->succs) != 1 [local count: 1044213920]: # prephitmp_25 = PHI <_24(4), 0(12)> _10 = _1 == 92; _13 = _10 | prephitmp_25; if (_13 != 0) goto ; [8.03%] else goto ; [91.97%] [local count: 83800315]: # s_19 = PHI return s_19; because the return isn't a jump out of the loop. I'm not sure how arm avoids that problem.
[Bug c++/118277] g++ ICEs with depedent inline-asm string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118277 --- Comment #6 from ak at gcc dot gnu.org --- Can you expand? None of the other callers of cp_parser_constant_expression seem to do anything special for templates.
[Bug tree-optimization/118198] tail merge/cross jump should not merge abort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118198 ak at gcc dot gnu.org changed: What|Removed |Added Summary|tail merge should not merge |tail merge/cross jump |abort |should not merge abort Resolution|INVALID |--- Status|RESOLVED|NEW --- Comment #10 from ak at gcc dot gnu.org --- cfgcleanup special cases sanitizer calls too. again the same could be done for __builtin_abort. Probably both should use a common function to check. /* For address sanitizer, never crossjump __asan_report_* builtins, otherwise errors might be reported on incorrect lines. */ if (flag_sanitize & SANITIZE_ADDRESS) { rtx call = get_call_rtx_from (i1); if (call && GET_CODE (XEXP (XEXP (call, 0), 0)) == SYMBOL_REF) { rtx symbol = XEXP (XEXP (call, 0), 0); if (SYMBOL_REF_DECL (symbol) && TREE_CODE (SYMBOL_REF_DECL (symbol)) == FUNCTION_DECL) { if ((DECL_BUILT_IN_CLASS (SYMBOL_REF_DECL (symbol)) == BUILT_IN_NORMAL) && DECL_FUNCTION_CODE (SYMBOL_REF_DECL (symbol)) >= BUILT_IN_ASAN_REPORT_LOAD1 && DECL_FUNCTION_CODE (SYMBOL_REF_DECL (symbol)) <= BUILT_IN_ASAN_STOREN) return dir_none; } } }
[Bug tree-optimization/118198] tail merge should not merge abort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118198 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org Ever confirmed|0 |1 Component|debug |tree-optimization Resolution|WONTFIX |--- Status|RESOLVED|NEW Summary|GCC wrong debug information |tail merge should not merge |bug |abort Last reconfirmed||2024-12-31
[Bug tree-optimization/118032] [15 regression] Bootstrap slowdown for risc-v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118032 --- Comment #29 from ak at gcc dot gnu.org --- We could also implement greedy switch clustering for jump tables I think. Right now it's only for the switch bitmap clustering.
[Bug target/118252] i386 should implement CASE_VECTOR_SHORTEN_MODE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118252 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #2 from ak at gcc dot gnu.org --- I suspect most switches are small, so even with a safety factor of 2 or 4 it would still be useful. Or alternatively could push the decision to the assembler with some .if, but that would definitely be more code.
[Bug middle-end/118864] Add nomerge attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118864 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #3 from ak at gcc dot gnu.org --- For cross jumping it would be in old_insns_match_p I think FWIW I tried to write a patch there for noreturn, but it didn't fix the original issue in PR118198, likely due to some other code transformations diff --git a/gcc/cfgcleanup.cc b/gcc/cfgcleanup.cc index d28d2323191..b784d5eca7a 100644 --- a/gcc/cfgcleanup.cc +++ b/gcc/cfgcleanup.cc @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "dbgcnt.h" #include "rtl-iter.h" #include "regs.h" +#include "calls.h" #include "function-abi.h" #define FORWARDER_BLOCK_P(BB) ((BB)->flags & BB_FORWARDER_BLOCK) @@ -1207,6 +1208,11 @@ old_insns_match_p (int mode ATTRIBUTE_UNUSED, rtx_insn *i1, rtx_insn *i2) || SIBLING_CALL_P (i1) != SIBLING_CALL_P (i2)) return dir_none; + /* Avoid merging noreturn to improve backtraces. */ + if (rtx call = get_call_rtx_from (i1); + call && find_reg_note (call, REG_NORETURN, NULL)) + return dir_none; + /* For address sanitizer, never crossjump __asan_report_* builtins, otherwise errors might be reported on incorrect lines. */ if (flag_sanitize & SANITIZE_ADDRESS) diff --git a/gcc/tree-ssa-tail-merge.cc b/gcc/tree-ssa-tail-merge.cc index d897970079c..fc23672930d 100644 --- a/gcc/tree-ssa-tail-merge.cc +++ b/gcc/tree-ssa-tail-merge.cc @@ -1312,6 +1312,10 @@ merge_stmts_p (gimple *stmt1, gimple *stmt2) if (lookup_stmt_eh_lp_fn (cfun, stmt1) != lookup_stmt_eh_lp_fn (cfun, stmt2)) return false; + /* Don't merge noreturn to give accurate backtraces. */ + if (is_gimple_call (stmt1) && (gimple_call_flags (stmt1) & ECF_NORETURN)) +return false; + if (is_gimple_call (stmt1) && gimple_call_internal_p (stmt1)) switch (gimple_call_internal_fn (stmt1))
[Bug gcov-profile/119375] Some autofdo test cases fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119375 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #5 from ak at gcc dot gnu.org --- Could someone bisect those failures please? I will need installing the autofdo tools.
[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628 --- Comment #2 from ak at gcc dot gnu.org --- The existing attributes could just handle this case?
[Bug c++/64500] push_to_top_level() shows up high during build of modern C++ code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500 --- Comment #9 from ak at gcc dot gnu.org --- I can test it later, but it would surprise me if it helps. The problem is not the computation but the misses. When profiling it I see a lot of cache misses on "cmp" memory load. So likely need to do something about the data structure. Looking at some LBR data the list walks just seem to be too long. Several of the iterations exceeded the 32 entry limit of the Intel LBR. A 90+ cycle latency must be multiple cache misses. I saw up to 340 cycles just for the loop body. e.g. here is an excerpt with cycle data 01278705jnz 0x12786e0 # PRED 74 cycles [74] 012786e0cmpw $0x2, (%rbx) 012786e4jz 0x1278e20 012786eamovq 0x20(%rbx), %rbp 012786eetest %rbp, %rbp 012786f1jz 0x12786fe 012786f3cmpq $0x0, 0x38(%rbp) 012786f8jnz 0x1278868 012786femovq 0x10(%rbx), %rbx 01278702test %rbx, %rbx 01278705jnz 0x12786e0 # PRED 78 cycles [152] 0.13 IPC 012786e0cmpw $0x2, (%rbx) 012786e4jz 0x1278e20 012786eamovq 0x20(%rbx), %rbp 012786eetest %rbp, %rbp 012786f1jz 0x12786fe 012786f3cmpq $0x0, 0x38(%rbp) 012786f8jnz 0x1278868 012786femovq 0x10(%rbx), %rbx 01278702test %rbx, %rbx 01278705jnz 0x12786e0 # PRED 356 cycles [508] 0.03 IPC 012786e0cmpw $0x2, (%rbx) 012786e4jz 0x1278e20 012786eamovq 0x20(%rbx), %rbp 012786eetest %rbp, %rbp 012786f1jz 0x12786fe 012786f3cmpq $0x0, 0x38(%rbp) 012786f8jnz 0x1278868 012786femovq 0x10(%rbx), %rbx 01278702test %rbx, %rbx 01278705jnz 0x12786e0 # PRED 24 cycles [532] 0.42 IPC 012786e0cmpw $0x2, (%rbx) 012786e4jz 0x1278e20 012786eamovq 0x20(%rbx), %rbp 012786eetest %rbp, %rbp 012786f1jz 0x12786fe 012786f3cmpq $0x0, 0x38(%rbp) 012786f8jnz 0x1278868 012786femovq 0x10(%rbx), %rbx 01278702test %rbx, %rbx 01278705jnz 0x12786e0 # PRED 94 cycles [626] 0.11 IPC 012786e0cmpw $0x2, (%rbx) 012786e4jz 0x1278e20 012786eamovq 0x20(%rbx), %rbp 012786eetest %rbp, %rbp 012786f1jz 0x12786fe 012786f3cmpq $0x0, 0x38(%rbp) 012786f8jnz 0x1278868 012786femovq 0x10(%rbx), %rbx 01278702test %rbx, %rbx 01278705jnz 0x12786e0 # PRED 70 cycles [696] 0.14 IPC ...
[Bug tree-optimization/119482] New: slow compilation on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482 Bug ID: 119482 Summary: slow compilation on Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- Created attachment 60892 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60892&action=edit input file This is a file from the Ladybird browser. It uses flatten. With flatten gcc compilation is a lot slower (40+s) vs clang (6s). The ladybird developers had to disable it to not make the CI time out. It doesn't look like a problem with the inliner, but the file just hitting general scaling limits. The profile is still fairly flat, but the top hot functions seem to be ranger and SSA related. time g++-15 -ftime-report -std=gnu++20 -O2 interpreter.i -S -w Time variable wall GGC phase setup: 0.00 ( 0%) 1952k ( 0%) phase parsing : 0.73 ( 2%) 237M ( 25%) phase lang. deferred : 0.28 ( 1%)57M ( 6%) phase opt and generate : 41.67 ( 98%) 651M ( 69%) |name lookup : 0.12 ( 0%)11M ( 1%) |overload resolution : 0.29 ( 1%)64M ( 7%) garbage collection : 0.39 ( 1%) 0 ( 0%) dump files : 0.02 ( 0%) 0 ( 0%) callgraph construction : 0.11 ( 0%)21M ( 2%) callgraph optimization : 0.18 ( 0%)52k ( 0%) callgraph functions expansion : 36.11 ( 85%) 369M ( 39%) callgraph ipa passes : 5.23 ( 12%) 234M ( 25%) ipa function summary : 0.08 ( 0%) 3583k ( 0%) ipa dead code removal : 0.02 ( 0%) 0 ( 0%) ipa cp : 0.11 ( 0%) 3444k ( 0%) ipa inlining heuristics: 0.19 ( 0%)13M ( 1%) ipa function splitting : 0.02 ( 0%) 842k ( 0%) ipa reference : 0.01 ( 0%) 0 ( 0%) ipa pure const : 0.04 ( 0%)32k ( 0%) ipa icf: 0.04 ( 0%) 2176 ( 0%) ipa SRA: 0.04 ( 0%) 738k ( 0%) ipa modref : 0.03 ( 0%) 793k ( 0%) cfg construction : 0.02 ( 0%) 2180k ( 0%) cfg cleanup: 0.78 ( 2%) 5539k ( 1%) trivially dead code: 0.12 ( 0%) 0 ( 0%) df scan insns : 0.11 ( 0%)20k ( 0%) df reaching defs : 1.37 ( 3%) 0 ( 0%) df live regs : 3.91 ( 9%) 0 ( 0%) df live&initialized regs : 4.46 ( 10%) 0 ( 0%) df must-initialized regs : 0.02 ( 0%) 0 ( 0%) df use-def / def-use chains: 0.20 ( 0%) 0 ( 0%) df live reg subwords : 0.04 ( 0%) 0 ( 0%) df reg dead/unused notes : 0.61 ( 1%) 5459k ( 1%) register information : 0.20 ( 0%) 0 ( 0%) alias analysis : 0.26 ( 1%)10M ( 1%) alias stmt walking : 2.70 ( 6%) 2275k ( 0%) register scan : 0.03 ( 0%) 107k ( 0%) rebuild jump labels: 0.07 ( 0%) 0 ( 0%) preprocessing : 0.03 ( 0%) 1942k ( 0%) parser (global): 0.07 ( 0%)51M ( 5%) parser struct body : 0.10 ( 0%)31M ( 3%) parser function body : 0.07 ( 0%)11M ( 1%) parser inl. func. body : 0.04 ( 0%) 7108k ( 1%) parser inl. meth. body : 0.14 ( 0%)34M ( 4%) template instantiation : 0.52 ( 1%) 150M ( 16%) constant expression evaluation : 0.03 ( 0%) 3423k ( 0%) constraint satisfaction: 0.02 ( 0%) 2596k ( 0%) early inlining heuristics : 0.06 ( 0%) 9725k ( 1%) inline parameters : 0.12 ( 0%) 6552k ( 1%) integration: 0.76 ( 2%) 178M ( 19%) tree gimplify : 0.08 ( 0%)15M ( 2%) tree eh: 0.07 ( 0%) 6680k ( 1%) tree CFG construction : 0.02 ( 0%) 8370k ( 1%) tree CFG cleanup : 1.04 ( 2%) 600k ( 0%) tree tail merge: 0.07 ( 0%) 2549k ( 0%) tree VRP : 0.57 ( 1%) 5979k ( 1%) tree Early VRP : 0.34 ( 1%) 7126k ( 1%) tree copy propagation : 0.20 ( 0%) 111k ( 0%) tree PTA : 1.73 ( 4%) 5483k ( 1%) tree SSA rewrite : 0.03 (
[Bug middle-end/119482] slow compilation on ladybird interpreter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482 --- Comment #5 from ak at gcc dot gnu.org --- Also I should add that the Ladybird developers report a 40% performance improvement from adding flatten to clang.
[Bug c++/64500] push_to_top_level() shows up high during build of modern C++ code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500 --- Comment #11 from ak at gcc dot gnu.org --- Okay it's not aliases just all the decls of the scope. I think it would benefit from two lists, one list of marked decls, and another of yet to mark decls. So that the already marked bindings don't need to be re-walked.
[Bug c++/64500] push_to_top_level() shows up high during build of modern C++ code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500 --- Comment #10 from ak at gcc dot gnu.org --- I misidentified the hot loop, it's actually this one in store_bindings: for (t = names; t; t = TREE_CHAIN (t)) { if (TREE_CODE (t) == TREE_LIST) id = TREE_PURPOSE (t); else id = DECL_NAME (t); if (store_binding_p (id)) bindings_need_stored.safe_push (id); } So it's a list of aliases that can get long? >From the LBR log store_binding_p is near always false. Perhaps the list of ids that need to be stored can be cached?
[Bug middle-end/119482] slow compilation on ladybird interpreter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482 --- Comment #3 from ak at gcc dot gnu.org --- I ran a full comparison now. There is actually a significant regression between g++-13 and g++-14, but -15 is roughly the same as -14. All are significantly slower than clang: clang++-19 -std=gnu++20 Interpreter.cpp -I ../../.. -I ../.. -w -S -o x.s -O2 ran 1.17 ± 0.19 times faster than clang++-18 -std=gnu++20 Interpreter.cpp -I ../../.. -I ../.. -w -S -o x.s -O2 5.10 ± 0.51 times faster than g++-13 -std=gnu++20 Interpreter.cpp -I ../../.. -I ../.. -w -S -o x.s -O2 5.91 ± 0.60 times faster than g++-15 -std=gnu++20 Interpreter.cpp -I ../../.. -I ../.. -w -S -o x.s -O2 6.15 ± 0.61 times faster than g++-14 -std=gnu++20 Interpreter.cpp -I ../../.. -I ../.. -w -S -o x.s -O2 For clang flatten just based on cfi_startproc gcc actually generates more functions: % grep -c cfi_startproc interpreter-clang.s 570 % grep -c cfi_startproc interpreter-gcc.s 610 but gcc indeed generates much more code: textdata bss dec hex filename 3115911536 1 313128 4c728 interpreter-clang.o 783346 8 2 783356 bf3fc interpreter-gcc.o So yes there might be a difference in flatten semantics I'm attaching a input file that works for clang if you want to look yourself.
[Bug middle-end/119482] slow compilation on ladybird interpreter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482 --- Comment #4 from ak at gcc dot gnu.org --- Created attachment 60902 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60902&action=edit input file for clang testing
[Bug middle-end/119482] slow compilation on ladybird interpreter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482 --- Comment #9 from ak at gcc dot gnu.org --- For the ICE i'm not sure why i'm not seeing it. The input file should have had flatten enabled.
[Bug c++/119387] [14/15 Regression] Regression in performance by a factor of 6 when building with debugging symbols since r14-5979
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119387 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #15 from ak at gcc dot gnu.org --- With the PR114563 alloc_page free list patch I get ../obj-fast/gcc/cc1plus-allocpage -std=gnu++20 -O2 pr119387.cc -quiet ran 1.04 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119387.cc -quiet 2.63 ± 0.01 times faster than ../obj-fast/gcc/cc1plus-allocpage -std=gnu++20 -O2 pr119387.cc -quiet -ggdb 2.78 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119387.cc -quiet -ggdb
[Bug middle-end/114563] ggc_internal_alloc is slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114563 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #13 from ak at gcc dot gnu.org --- >so my idea was to have multiple freelists so that p->bytes == entry_size >and this list walk, which is the bottleneck for PR119387 I think, is >improved. It should be improved because the first element will near always match. I only kept the comparison for the fallback case: if there is no free list for a given size so it puts the size into freelist[0]. But I'm not sure this can actually happen (for simple tests it never triggers). If the fallback is removed the comparison could be removed too, but it probably doesn't matter for performance. >Using your patch this changes to >Samples: 1M of event 'cycles:Pu', Event count (approx.): 1053172130606 > 0.02% 234 cc1plus cc1plus [.] alloc_page(unsigned int) >so the patch works as intended! Great. I will submit it for phase 1 if I don't forget.
[Bug middle-end/114563] ggc_internal_alloc is slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114563 --- Comment #14 from ak at gcc dot gnu.org --- >to do this for entry_size < G.pagesize * GGC_QUIRE_SIZE, this should >avoid fragmenting the virtual address space. Possibly do this only >for USING_MADVISE, not sure. Okay let me test that.
[Bug middle-end/119482] slow compilation on ladybird interpreter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482 --- Comment #8 from ak at gcc dot gnu.org --- The workload does a lot of bitmap manipulations: # 5.62% cc1plus cc1plus [.] bitmap_and_into(bitmap_head*, bitmap_head const*) 5.30% cc1plus cc1plus [.] bitmap_element_allocate(bitmap_head*) 3.93% cc1plus cc1plus [.] bitmap_ior_into(bitmap_head*, bitmap_head const*) 3.85% cc1plus cc1plus [.] bitmap_list_find_element(bitmap_head*, unsigned int) 2.09% cc1plus cc1plus [.] bitmap_and(bitmap_head*, bitmap_head const*, bitmap_head const*) 1.77% cc1plus cc1plus [.] bitmap_elt_ior(bitmap_head*, bitmap_element*, bitmap_element*, bitmap_element const*, bitmap_element const*> 1.44% cc1plus cc1plus [.] bitmap_set_bit(bitmap_head*, int) 1.43% cc1plus cc1plus [.] bitmap_copy(bitmap_head*, bitmap_head const*) 1.41% cc1plus cc1plus [.] bitmap_ior_and_compl(bitmap_head*, bitmap_head const*, bitmap_head const*, bitmap_head const*) 1.33% cc1plus cc1plus [.] bitmap_elt_copy(bitmap_head*, bitmap_element*, bitmap_element*, bitmap_element const*, bool) Looking at the samples most of it seem to be cache misses of some sort (working set too big), but bitmap_set_bit stands out by having a misprediction This simple patch improves runtime by 15%. Which is more than I expected given it only has ~1.44% of the cycles, but I guess the mispredicts caused some down stream effects. ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet ran 1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet Most callers of bitmap_set_bit don't need the return value, but with the conditional store the CPU still has to predict it correctly since gcc doesn't know how to do that without APX (even though CMOV could do it with a dummy target). If we make the write unconditional the memory bandwidth increases, but it is made up by less mispredictions. >From the performance counter results it doesn't do much to the bandwidth, but reduces the number of branches drastically. Even though the misprediction rate goes up it is a lot less cycles wasted because of less branches. $ perf stat -e branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet -w Performance counter stats for 'system wide': 41,932,957,091 branches 686,117,623 branch-misses#1.64% of all branches 43,690.47 MiB uncore_imc/cas_count_read/ 12,362.56 MiB uncore_imc/cas_count_write/ 49.328633365 seconds time elapsed $ perf stat -e branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet -w Performance counter stats for 'system wide': 37,092,113,179 branches 663,641,708 branch-misses#1.79% of all branches 43,196.52 MiB uncore_imc/cas_count_read/ 12,369.33 MiB uncore_imc/cas_count_write/ 42.632458350 seconds time elapsed Patch: diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc index f5a64b495ab3..7744f8f8c2e4 100644 --- a/gcc/bitmap.cc +++ b/gcc/bitmap.cc @@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit) if (ptr != 0) { bool res = (ptr->bits[word_num] & bit_val) == 0; - if (res) - ptr->bits[word_num] |= bit_val; + ptr->bits[word_num] |= bit_val; return res; }
[Bug target/119628] New: Need better mechanisms to manage register saves in callee for tail calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628 Bug ID: 119628 Summary: Need better mechanisms to manage register saves in callee for tail calls Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- Target: x86_64 Created attachment 60997 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60997&action=edit toy byte code interpreter To compile the test case use One use case for the musttail feature is to write threaded interpreters with individual small functions each implementing an byte code and calling the next function in the byte code program using musttail. This is a replacement for an older code style that put all these byte code handlers into a large function and called them using indirect goto. See the attached test case as an example. This works fine for small functions that fit into the callee scratch registers in the x86-64 ABI. But when you have more complex functions that need more registers the individual functions starting saving/restoring the registers that are supposed to be callee saved (this is simulated using inline asm in the test case, thanks the Andrew Pinski for that trick) You can see that in the case if you make the SAVE_REGS/DONT_SAVE_REGS empty, there are lots of extra push/pops on each opcode. Now this can be changed by modifying the calling convention as it's done in the unmodified test case. The original caller of the byte code can save all and the rest of the tail called byte code functions none. LLVM has preserve_none/most/all for this and it is used in the field for this. When the tail called functions are not called through pointers gcc has -fipa-ra for static functions, which should take care of it. But unfortunately this only works for direct calls because for indirects the IPA cgraph RTL mechanism doesn't work. gcc has no_callee_saved_registers/no_caller_saved_registers which was originally developed for a different use case (fast interrupt handlers in OS) but can modify the callee registers saving. The main drawback of them is that they require -mgeneral-regs-only (as they were designed for an OS), which makes it impossible to use floating point in the interpreter code. While this works for the toy example it's probably a show stopper for real interpreters. Another problem with them is that they don't affect the caller unlike the LLVM attributes. Luckily for the tail call case the shrink wrapping code takes care of this, although it's a problem if the byte code functions are called non tail for some reason (e.g. in the first function of the interpreter), a well as for other use cases (e.g. to use them to optimize calling of general cold functions) gcc should: - support no_callee_saved_registers/no_caller_saved_registers without -mgeneral-regs-only (there might be already bugs for this, but I'm filing it separately to track the particular use case) - figure out how -fipa-ra can be made to work for indirects? (maybe with some type based analysis) - Make the attributes affect the caller - Do we need an equivalent of preserve_most - Once no_callee/caller_saved_registers work similar to clang perhaps they should be aliased for compatibility.
[Bug c++/64500] push_to_top_level() shows up high during Chromium build.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #4 from ak at gcc dot gnu.org --- I see the same when building things like LLVM with a modern gcc. push_to_top_level consistently uses 5-6% of the CPU time and is by far the most expensive function of the compiler. The hot comparison is the global_scope_p check below. Need a better data structure? /* Have to include the global scope, because class-scope decls aren't listed anywhere useful. */ for (; b; b = b->level_chain) { tree t; /* Template IDs are inserted into the global level. If they were inserted into namespace level, finish_file wouldn't find them when doing pending instantiations. Therefore, don't stop at namespace level, but continue until :: . */ if (global_scope_p (b)) break;
[Bug c++/64500] push_to_top_level() shows up high during Chromium build.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500 ak at gcc dot gnu.org changed: What|Removed |Added Version|5.0 |14.0 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2025-03-24