[Bug target/56619] i386 hle atomic intrinsics flags are undocumented

2013-03-14 Thread ak at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56619



--- Comment #2 from ak at gcc dot gnu.org 2013-03-15 04:31:53 UTC ---

Author: ak

Date: Fri Mar 15 04:31:43 2013

New Revision: 196671



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196671

Log:

Document HLE / RTM intrinsics



The TSX HLE/RTM intrinsics were missing documentation. Add this to the

manual.



gcc/:

2013-03-14  Andi Kleen  



PR target/56619

* doc/extend.texi: Document __ATOMIC_HLE_ACQUIRE,

__ATOMIC_HLE_RELEASE. Document __builtin_ia32 TSX intrincs.

Document _x* TSX intrinsics.



Modified:

trunk/gcc/ChangeLog

trunk/gcc/doc/extend.texi


[Bug target/55139] __atomic store does not support __ATOMIC_HLE_RELEASE

2012-11-09 Thread ak at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55139



--- Comment #5 from ak at gcc dot gnu.org 2012-11-09 15:24:32 UTC ---

Author: ak

Date: Fri Nov  9 15:24:25 2012

New Revision: 193363



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193363

Log:

Handle target specific memory models in C frontend



get_atomic_generic_size would error out for

__atomic_store(...,__ATOMIC_HLE_RELEASE)



Just mask it out. All the memory orders are checked completely

in builtins.c anyways.



I'm not sure what that check is for, it could be removed in theory.



Passed bootstrap and test suite on x86-64



gcc/c-family/:

2012-11-09  Andi Kleen  



PR 55139

* c-common.c (get_atomic_generic_size): Mask with

MEMMODEL_MASK



Modified:

trunk/gcc/c-family/ChangeLog

trunk/gcc/c-family/c-common.c


[Bug lto/46905] -flto -fno-lto does not disable lto

2010-12-19 Thread ak at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46905

--- Comment #3 from ak at gcc dot gnu.org 2010-12-19 19:36:29 UTC ---
Author: ak
Date: Sun Dec 19 19:36:25 2010
New Revision: 168071

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168071
Log:
Fix -fno-lto (PR lto/46905)

gcc/

2010-12-19  Andi Kleen

PR lto/46905
* collect2.c (main): Handle -fno-lto.
* opts.c (common_handle_option): Handle -fno-lto.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/collect2.c
trunk/gcc/opts.c


[Bug lto/50679] Linux kernel LTO tracking bug

2011-10-09 Thread ak at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50679

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #1 from ak at gcc dot gnu.org 2011-10-09 14:06:44 UTC ---
urrently I have to revert 2 patches to get anywhere near a build
and work around 50644. 32bit builds don't work at all because of
the tree-nrv problem.

50620 causes incredly slow builds because partitioning has to be disabled


[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map

2011-10-17 Thread ak at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636

--- Comment #15 from ak at gcc dot gnu.org 2011-10-17 14:43:45 UTC ---
Author: ak
Date: Mon Oct 17 14:43:37 2011
New Revision: 180093

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180093
Log:
Use MADV_DONTNEED for freeing in garbage collector

Use the Linux MADV_DONTNEED call to unmap free pages in the garbage
collector.Then keep the unmapped pages in the free list. This avoid
excessive memory fragmentation on large LTO bulds, which can lead
to gcc bumping into the Linux vm_max_map limit per process.

gcc/:

2011-10-08  Andi Kleen  

PR other/50636
* config.in, configure: Regenerate.
* configure.ac (madvise): Add to AC_CHECK_FUNCS.
* ggc-page.c (USING_MADVISE): Add.
(page_entry): Add discarded field.
(alloc_page): Check for discarded pages.
(release_pages): Add USING_MADVISE branch.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config.in
trunk/gcc/configure
trunk/gcc/configure.ac
trunk/gcc/ggc-page.c


[Bug middle-end/88573] 9 regression: error: type mismatch in component reference

2018-12-23 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88573

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||ak at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #2 from ak at gcc dot gnu.org ---
Dup

*** This bug has been marked as a duplicate of bug 88140 ***

[Bug lto/88140] [9 Regression] ICE: verify_gimple failed since r266325

2018-12-23 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88140

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #10 from ak at gcc dot gnu.org ---
*** Bug 88573 has been marked as a duplicate of this bug. ***

[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE

2017-12-11 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355

--- Comment #2 from ak at gcc dot gnu.org ---
Author: ak
Date: Mon Dec 11 16:13:53 2017
New Revision: 255540

URL: https://gcc.gnu.org/viewcvs?rev=255540&root=gcc&view=rev
Log:
Fix stack overflow with autofdo (PR83355)

g++.dg/bprob* is failing currently with autofdo.

Running in gdb shows that there is a very deep recursion in get_index_by_decl
until it
overflows the stack.

gcc/:
2017-12-11  Andi Kleen  

PR gcov-profile/83355
* auto-profile.c (string_table::get_index_by_decl): Don't
recurse when abstract origin points to itself.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/auto-profile.c
trunk/gcc/lto-streamer-in.c

[Bug c++/55223] [C++11] Default lambda expression of a templated class member

2013-01-20 Thread ak at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55223



--- Comment #2 from ak at gcc dot gnu.org 2013-01-20 19:03:29 UTC ---

Author: ak

Date: Sun Jan 20 19:03:22 2013

New Revision: 195321



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195321

Log:

libstdc++: Add mem_order_hle_acquire/release to atomic.h v2



The underlying compiler supports additional __ATOMIC_HLE_ACQUIRE/RELEASE

memmodel flags for TSX, but this was not exposed to the C++ wrapper.

Handle it there.



These are additional flags, so some of assert checks need to mask

off the flags before checking the memory model type.



libstdc++-v3/:

2013-01-12  Andi Kleen  

Jonathan Wakely  



PR libstdc++/55223

* include/bits/atomic_base.h (__memory_order_modifier): Add

__memory_order_mask, __memory_order_modifier_mask,

__memory_order_hle_acquire, __memory_order_hle_release.

(operator|,operator&): Add.

(__cmpexch_failure_order):  Rename to __cmpexch_failure_order2.

(__cmpexch_failure_order): Add.

(clear, store, load, compare_exchange_weak, compare_exchange_strong):

Handle flags.

* testsuite/29_atomics/atomic_flag/test_and_set/explicit-hle.cc:

Add.



Added:

   

trunk/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit-hle.cc

Modified:

trunk/libstdc++-v3/ChangeLog

trunk/libstdc++-v3/include/bits/atomic_base.h


[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2017-05-12 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684

--- Comment #6 from ak at gcc dot gnu.org ---
Author: ak
Date: Fri May 12 10:09:50 2017
New Revision: 247962

URL: https://gcc.gnu.org/viewcvs?rev=247962&root=gcc&view=rev
Log:
Limit perf data buffer during profiling

With high -j parallelism the autofdo tests can randomly fail.
autofdo uses Linux perf to record profiling data.
Linux perf uses a locked perf buffer. By default it has
around 516k buffer per uid (/proc/sys/kernel/perf_event_mlock_kb).

An individual perf record tries to grab the full 516k,
which makes parallel perf record fail.

This patch limits the perf buffer for individual perf record to 8k.
With the default settings this allows a parallelism of the test
cases of 16, which is hopefully good enough

(if not would need to add some kind of semaphore, or ask
the user to increase the limit as root)

I also removed an unneeded -o perf.data option

Thanks to Marcin to finally spotting the problem.

Passes bootstrap and test on x86_64-linux. Ok for trunk?

gcc/testsuite/:

2017-05-12  Andi Kleen  

PR testsuite/77684
* lib/target-supports.exp (profopt-perf-wrapper):
Add -m8 option to increase parallelism.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/lib/target-supports.exp

[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335

2014-11-10 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804

--- Comment #11 from ak at gcc dot gnu.org ---
Author: ak
Date: Tue Nov 11 05:10:58 2014
New Revision: 217336

URL: https://gcc.gnu.org/viewcvs?rev=217336&root=gcc&view=rev
Log:
Error out for Cilk_spawn or array expression in forbidden places

_Cilk_spawn or Cilk array expressions are only allowed on their own,
but not in for(), if(), switch, do, while, goto, etc.
The C parser didn't always check for that, which lead to ICEs earlier
for invalid code.

Add a generic helper that checks this and call it where needed
in the C frontend.

I chose to allow spawn/array for for init and increment expressions.
While the Cilk spec could be interpreted to forbid it there too
there didn't seem any reason to not allow it.

One dark corner is spawn, array in statement expressions not at
the end. Right now that's forbidden too.

gcc/c-family/:

2014-11-10  Andi Kleen  

PR c/60804
* c-common.h (check_no_cilk): Declare.
* cilk.c (get_error_location): New function.
(check_no_cilk): Dito.

gcc/c/:

2014-11-10  Andi Kleen  

PR c/60804
* c-parser.c (c_parser_statement_after_labels): Call
check_no_cilk.
(c_parser_if_statement): Dito.
(c_parser_switch_statement): Dito.
(c_parser_while_statement): Dito.
(c_parser_do_statement): Dito.
(c_parser_for_statement): Dito.
* c-typeck.c (c_finish_loop): Dito.

Modified:
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-common.h
trunk/gcc/c-family/cilk.c
trunk/gcc/c/ChangeLog
trunk/gcc/c/c-parser.c
trunk/gcc/c/c-typeck.c


[Bug middle-end/60467] ICE with -fcilkplus

2014-11-30 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60467

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||ak at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #5 from ak at gcc dot gnu.org ---
Definitely fixed.


[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335

2014-09-28 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #9 from ak at gcc dot gnu.org ---
This still ICEs with gcc version 5.0.0 20140926 (experimental) (GCC)


[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335

2014-09-28 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804

--- Comment #10 from ak at gcc dot gnu.org ---
Reduced test case. It's probably invalid cilk, but gcc shouldn't ICE:

fn1() {
  if (_Cilk_spawn func_2())
;
}


[Bug c/61898] Variadic functions accept va_list without warning

2014-09-28 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61898

--- Comment #1 from ak at gcc dot gnu.org ---
I agree such a warning would make sense.


[Bug c/63398] New: Cilk errors out incorrectly for spawn inside statement expressions

2014-09-28 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63398

Bug ID: 63398
   Summary: Cilk errors out incorrectly for spawn inside statement
expressions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org

Like in:

void f2();
int f()
{
return ({ _Cilk_spawn f2(); 0; });
} 

and some other places that use contains_silk_spawn_stmt to check for errors.
But that should be legal.

The problem is the walk_tree in contains_silk_spawn_statement doesn't stop
recursing into the statement.


[Bug tree-optimization/56580] Internal compiler error when trying to compile a sequence of NOPs inside a loop

2014-10-07 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56580

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||ak at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #5 from ak at gcc dot gnu.org ---
Fixed since some time in trunk with 

2013-09-08  Andi Kleen  

* tree-inline.c (estimate_num_insns): Limit asm cost to 1000.


[Bug c++/63472] transaction_atomic within while loop causes ICE

2014-10-07 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-10-08
 CC||ak at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from ak at gcc dot gnu.org ---
Confirmed with trunk.

Program received signal SIGSEGV, Segmentation fault.
0x008805b1 in copy_bbs (bbs=0x1e8ecc8, n=9, new_bbs=0x1e8e810,
edges=0x0, num_edges=0, 
new_edges=0x0, base=0x0, after=0x76c3aa90, update_dominance=true) at
../../gcc/gcc/cfghooks.c:1335
1335  if (dom_bb->flags & BB_DUPLICATED)
(gdb) p dom_bb->flags
Cannot access memory at address 0x50
(gdb)


[Bug c++/63472] transaction_atomic within while loop causes ICE

2014-10-07 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472

--- Comment #2 from ak at gcc dot gnu.org ---
Looks like there are more problems with -fgnu-tm

I hacked csmith to generate random __transaction_atomic blocks and I got a lot
of crashes immediately. All I looked at were variants of these two:

0x8e23b7 crash_signal
../../gcc/gcc/toplev.c:340
0x92df5c copy_loops
../../gcc/gcc/tree-inline.c:2379
0x93225c copy_cfg_body
../../gcc/gcc/tree-inline.c:2583
0x93225c copy_body
../../gcc/gcc/tree-inline.c:2777
0x935ab3 tree_function_versioning(tree_node*, tree_node*, vec*, bool, bitmap_head*, bool, bitmap_head*, basic_block_def*)

and

0x6d7465 expand_expr_addr_expr_1
../../gcc/gcc/expr.c:7737
0x6cd9a6 expand_expr_addr_expr
../../gcc/gcc/expr.c:7779
0x6cd9a6 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/gcc/expr.c:10604
0x6084f1 expand_normal
../../gcc/gcc/expr.h:457
0x6084f1 precompute_register_parameters
../../gcc/gcc/calls.c:832
0x6084f1 expand_call(tree_node*, rtx_def*, int)
../../gcc/gcc/calls.c:3002
0x5fbeb0 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
../../gcc/gcc/builtins.c:6825
0x6cdd95 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/gcc/expr.c:10369
0x6d751a store_expr(tree_node*, rtx_def*, int, bool)
../../gcc/gcc/expr.c:5337
0x6dc2d9 expand_assignment(tree_node*, tree_node*, bool)


[Bug c++/63472] transaction_atomic within while loop causes ICE

2014-10-07 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472

--- Comment #3 from ak at gcc dot gnu.org ---
Another one:

0x8e23b7 crash_signal
../../gcc/gcc/toplev.c:340
0x61be46 copy_bbs(basic_block_def**, unsigned int, basic_block_def**,
edge_def**, unsigned int, edge_def**, loop*, basic_block_def*, bool)
../../gcc/gcc/cfghooks.c:1335
0x8eaecf ipa_uninstrument_transaction
../../gcc/gcc/trans-mem.c:4093
0x8eaecf ipa_tm_scan_calls_transaction
../../gcc/gcc/trans-mem.c:4167
0x8eaecf ipa_tm_execute
../../gcc/gcc/trans-mem.c:5340
0x8eaecf execute
../../gcc/gcc/trans-mem.c:5578


[Bug c++/63472] transaction_atomic within while loop causes ICE

2014-10-07 Thread ak at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63472

--- Comment #4 from ak at gcc dot gnu.org ---
Reduced test cases for all three crashes. I suspect multiple have a similar
root cause (except perhaps for the expand_expr_addr_expr_1 one)

It looks like the transaction code messes up cfgloops.

copy_bbs:

(illegal code due to goto into transaction?)

g_56[];
fn1() {
  int *p_79;
  if (g_56[7])
__transaction_atomic {
lbl_196:
  *p_79 = 1;
}
  else
goto lbl_196;
}


expand_expr_addr_expr_1:

struct {
  unsigned : 7;
  signed f6 : 4
} g_35;
safe_rshift_func_uint16_t_u_s() {}

func_28() {
  __transaction_atomic { safe_rshift_func_uint16_t_u_s(g_35.f6); }
}

copy_loops:

func_65() {
  __transaction_atomic {
for (;;)
  func_65();
  }
}


[Bug other/43448] gccbug should be removed

2010-10-18 Thread ak at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43448

--- Comment #1 from ak at gcc dot gnu.org 2010-10-18 09:39:19 UTC ---
Author: ak
Date: Mon Oct 18 09:39:15 2010
New Revision: 165613

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165613
Log:
Remove gccbug

gcc/

2010-10-18  Andi Kleen  

PR other/43448
* gccbug.in: Remove.
* Makefile.in (GCCBUG_INSTALL_NAME, gccbug): Remove
(doc, distclean, install-common): Remove reference to gccbug.
* configure: Regenerate.
* configure.ac (all_outputs): Remove gccbug.
* doc/configfiles.texi: Remove references to gccbug.
* doc/sourcebuild.texi: Dito.

contrib/

2010-10-18  Andi Kleen  

* gccbug.el: Remove.

Removed:
trunk/contrib/gccbug.el
trunk/gcc/gccbug.in
Modified:
trunk/contrib/ChangeLog
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/configure
trunk/gcc/configure.ac
trunk/gcc/doc/configfiles.texi
trunk/gcc/doc/sourcebuild.texi


[Bug tree-optimization/36602] memset should be optimized into an empty CONSTRUCTOR

2011-06-22 Thread ak at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36602

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org,
   ||mjambor at suse dot cz

--- Comment #3 from ak at gcc dot gnu.org 2011-06-22 20:30:56 UTC ---
I ran into a similar problem in my code.

It would be nice if memset didn't break SRA.


[Bug target/93768] Use vpternlog for composite logical operations

2022-09-25 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93768

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org
 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from ak at gcc dot gnu.org ---
Most of it is already done as part of PR101989

One issue is that it is only for suitable vector types, it doesn't really work
for scalars because the compiler has no idea that a conversion might be
profitable. Perhaps that would be an interesting (but likely) separate feature
to define some frame work to figure out if switching to the vector ISA is worth
it.

*** This bug has been marked as a duplicate of bug 101989 ***

[Bug target/101989] Fail to optimize (a & b) | (c & ~b) to vpternlog instruction.

2022-09-25 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101989

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rth at gcc dot gnu.org

--- Comment #8 from ak at gcc dot gnu.org ---
*** Bug 93768 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-08-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
 CC||ak at gcc dot gnu.org

--- Comment #6 from ak at gcc dot gnu.org ---
Change checked in

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-08-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115866, which changed state.

Bug 115866 Summary: missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/115130] [meta-bug] early break vectorization

2024-08-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
Bug 115130 depends on bug 115866, which changed state.

Bug 115866 Summary: missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug testsuite/116080] [15 regression] New tests from r15-2233-g8d1af8f904a0c0 fail

2024-09-03 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||ak at gcc dot gnu.org

--- Comment #12 from ak at gcc dot gnu.org ---
Patch checked in

[Bug tree-optimization/116520] Multiple condition lead to missing vectorization due to missing early break

2024-09-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|WAITING
 Resolution|DUPLICATE   |---

--- Comment #6 from ak at gcc dot gnu.org ---
No this is not a dup. This bug is about early break.

The other bug is about switch.

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-09-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from ak at gcc dot gnu.org ---
It is still fixed.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-09-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115866, which changed state.

Bug 115866 Summary: missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-09-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 116520, which changed state.

Bug 116520 Summary: Multiple condition lead to missing vectorization due to 
missing early break
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520

   What|Removed |Added

 Status|RESOLVED|WAITING
 Resolution|DUPLICATE   |---

[Bug tree-optimization/115130] [meta-bug] early break vectorization

2024-09-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
Bug 115130 depends on bug 116520, which changed state.

Bug 116520 Summary: Multiple condition lead to missing vectorization due to 
missing early break
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520

   What|Removed |Added

 Status|RESOLVED|WAITING
 Resolution|DUPLICATE   |---

[Bug tree-optimization/115130] [meta-bug] early break vectorization

2024-09-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
Bug 115130 depends on bug 115866, which changed state.

Bug 115866 Summary: missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug preprocessor/79465] infinite #include cycle is not detected

2024-06-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79465

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org
 Resolution|DUPLICATE   |---
   Last reconfirmed||2024-06-26
 Ever confirmed|0   |1
 Status|RESOLVED|NEW

[Bug c/115704] New: -Wstringop-overread and related warnings should print inline stack

2024-06-28 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115704

Bug ID: 115704
   Summary: -Wstringop-overread and related warnings should print
inline stack
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

Forked from PR115274
Since they often depend on inlining and the exact caller, and for the user to
determine if they read or not they need to know that.

[Bug tree-optimization/115274] Bogus -Wstringop-overread in SQLite source code

2024-06-28 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115274

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-06-29
 CC||ak at gcc dot gnu.org

--- Comment #9 from ak at gcc dot gnu.org ---
creduce minimized it to

#include 
char *c;
void a();
int b(char *d) { return strlen(d); }
void e() {
  long f = 1;
  f = b(c + f);
  if (c == 0)
a(f);
}

>From the one it seems to be invalid because the c global is indeed NULL.

but it's hard to say if it is exactly equivalent because it will depend on the
caller and the original test case had something like 30+ callers, so we don't
know the exact context.

Problem is that these warnings which depend on inlining should really print the
inline stack for the instance that triggers the warning. I opened PR115704

[Bug c++/115728] Feature Request: inline assembly improvements for C++

2024-07-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115728

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||ak at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from ak at gcc dot gnu.org ---
The constexpr asm support is in trunk. It supports templates.


>The second is I want finer grain control over marking memory regions as 
>needing >to be updated before inline assembly code is executed, or invalidated 
>after.

You can do that by specifying the memory region to be updated in the
input/output list

[Bug c/83324] [feature request] Pragma or special syntax for guaranteed tail calls

2024-07-23 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83324

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||ak at gcc dot gnu.org
 Status|ASSIGNED|RESOLVED

--- Comment #27 from ak at gcc dot gnu.org ---
Implemented in trunk in a mostly LLVM compatible way. There are some remaining
open issues (PR116019, PR115979, PR115606, PR115607) , but none should be show
stoppers.

There are some differences to clang, mainly that gcc handles a few cases that
clang doesn't, but clang handles more cases with -O0. The success also depends
on the architecture and the languages (C is better than C++ due to PR115606)

[Bug middle-end/116510] [15 Regression] ice in decompose, at wide-int.h:1049

2024-10-15 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116510

--- Comment #12 from ak at gcc dot gnu.org ---
Like this?  It fixes the test case.

I'm not sure why you want AND_EXPR, this is a truth formula. Maybe it should be
TRUTH_ANDIF_EXPR though to short circuit.

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 90c754a48147..376a4642954d 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1477,10 +1477,12 @@ predicate_bbs (loop_p loop)
{
  tree low = build2_loc (loc, GE_EXPR,
 boolean_type_node,
-index, CASE_LOW (label));
+index, fold_convert_loc (loc,
TREE_TYPE (index),
+CASE_LOW (label)));
  tree high = build2_loc (loc, LE_EXPR,
  boolean_type_node,
- index, CASE_HIGH (label));
+ index, fold_convert_loc (loc,
TREE_TYPE (index),
+ CASE_HIGH (label)));
  case_cond = build2_loc (loc, TRUTH_AND_EXPR,
  boolean_type_node,
  low, high);
@@ -1489,7 +1491,8 @@ predicate_bbs (loop_p loop)
case_cond = build2_loc (loc, EQ_EXPR,
boolean_type_node,
index,
-   CASE_LOW (gimple_switch_label (sw,
i)));
+   fold_convert_loc (loc, TREE_TYPE
(index),
+ CASE_LOW (label)));
  if (i > 1)
switch_cond = build2_loc (loc, TRUTH_OR_EXPR,
  boolean_type_node,

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-11 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

ak at gcc dot gnu.org changed:

   What|Removed |Added

Summary|bit_test_cluster takes  |switch clustering takes
   |extensive time with large   |extensive time with large
   |switches even at -O0|switches even at -O0

--- Comment #3 from ak at gcc dot gnu.org ---
With -fno-bit-tests -fno-jump-tables it compiles reasonably fast.
One bug is really that these two options are enabled by default even at -O0.
tree-switch-conversion has some logic for this, but it seems to be broken.

Second step would be to figure out how to improve the clustering algorithm
scaling.

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-11 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

--- Comment #4 from ak at gcc dot gnu.org ---
Here's a patch that enables the slow switch conversions only at -O2.
With that the test case builds reasonably quickly. 

diff --git a/gcc/common.opt b/gcc/common.opt
index 12b25ff486de..4af7a94fea42 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2189,11 +2189,11 @@ Common Var(flag_ivopts) Init(1) Optimization
 Optimize induction variables on trees.

 fjump-tables
-Common Var(flag_jump_tables) Init(1) Optimization
+Common Var(flag_jump_tables) Init(-1) Optimization
 Use jump tables for sufficiently large switch statements.

 fbit-tests
-Common Var(flag_bit_tests) Init(1) Optimization
+Common Var(flag_bit_tests) Init(-1) Optimization
 Use bit tests for sufficiently large switch statements.

 fkeep-inline-functions
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index 6468995eb316..1cca23671d70 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -442,7 +442,7 @@ public:
   /* Return whether bit test expansion is allowed.  */
   static inline bool is_enabled (void)
   {
-return flag_bit_tests;
+return flag_bit_tests >= 0 ? flag_bit_tests : (optimize > 1);
   }

   /* True when the jump table handles an entire switch statement.  */
@@ -524,7 +524,8 @@ bool jump_table_cluster::is_enabled (void)
  over-ruled us, we really have no choice.  */
   if (!targetm.have_casesi () && !targetm.have_tablejump ())
 return false;
-  if (!flag_jump_tables)
+  int flag = flag_jump_tables >= 0 ? flag_jump_tables : (optimize > 1);
+  if (!flag)
 return false;
 #ifndef ASM_OUTPUT_ADDR_DIFF_ELT
   if (flag_pic)

[Bug middle-end/117091] bit_test_cluster takes extensive time with large switches even at -O0

2024-10-11 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

ak at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2024-10-11
Summary|compile time Regression in  |bit_test_cluster takes
   |GCC Trunk vs GCC 6.1|extensive time with large
   ||switches even at -O0
 CC||ak at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from ak at gcc dot gnu.org ---
Problem seems to be in the bit test cluster detection

 87.71%  cc1[.]
tree_switch_conversion::bit_test_cluster::can_be_handled(vec const&,
   5.73%  cc1[.]
tree_switch_conversion::bit_test_cluster::find_bit_tests(vec&)
   4.78%  cc1[.]
tree_switch_conversion::bit_test_cluster::can_be_handled(unsigned long,
unsigned int)

Perhaps the bit_test_cluster check should depend on -O2, or need some limit.

[Bug middle-end/117091] bit_test_cluster takes extensive time with large switches even at -O0

2024-10-11 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

--- Comment #2 from ak at gcc dot gnu.org ---
Minimum patch. Only enable the clustering at -O2.

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 00426d46..468b15f1c461 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1375,7 +1375,8 @@ switch_conversion::expand (gswitch *swtch)
   gcc_checking_assert (TREE_TYPE (m_index_expr) != error_mark_node);

   /* Prefer bit test if possible.  */
-  if (tree_fits_uhwi_p (m_range_size)
+  if (optimize >= 2
+  && tree_fits_uhwi_p (m_range_size)
   && bit_test_cluster::can_be_handled (tree_to_uhwi (m_range_size),
m_uniq)
   && bit_test_cluster::is_beneficial (m_count, m_uniq))
 {

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-16 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #11 from ak at gcc dot gnu.org ---
Adding RichardS for the late-combine issue.

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-16 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

--- Comment #10 from ak at gcc dot gnu.org ---
https://github.com/andikleen/gcc/commit/9a71a4dbdd7094241bcdb0b89d7261c19dcc4b34

fixes the test case by checking early that bit clustering only works when
multiple labels point to the same code. It still needs a limit on the clusters
however.

With that fixes the test case shows a new issue in late combine undoing and
redoing things constantly. Is that a known problem?


-   35.83%  cc1  cc1   [.] temporarily_undo_changes(int)   
  ◆
 35.82% temporarily_undo_changes(int)  
  ▒
rtl_ssa::insn_info::calculate_cost() const 
  ▒
rtl_ssa::changes_are_worthwhile(array_slice, bool)   ▒
(anonymous
namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*,
rtl_ssa::insn_info*)  ▒
(anonymous namespace)::pass_late_combine::execute(function*)   
  ▒
execute_one_pass(opt_pass*)
  ▒
execute_pass_list_1(opt_pass*) 
  ▒
  + execute_pass_list_1(opt_pass*) 
  ▒
-   34.44%  cc1  cc1   [.] redo_changes(int)   
  ▒
 34.43% rtl_ssa::insn_info::calculate_cost() const 
  ▒
rtl_ssa::changes_are_worthwhile(array_slice, bool)   ▒
(anonymous
namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*,
rtl_ssa::insn_info*)  ▒
(anonymous namespace)::pass_late_combine::execute(function*)   
  ▒
execute_one_pass(opt_pass*)
  ▒
execute_pass_list_1(opt_pass*) 
  ▒
  + execute_pass_list_1(opt_pass*) 
  ▒

[Bug target/117312] RFE: x86 (and perhaps others): inline assembly: "red-zone" clobber

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117312

--- Comment #5 from ak at gcc dot gnu.org ---
Peter, can you construct a test case that demonstrates the problem?

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #20 from ak at gcc dot gnu.org ---
Reopen because the patch with the new algorithm has been reverted due to
PR117352
It doesn't take range comparisons into account, and probably needs to
understand CCMP

[Bug c++/117351] New: ICE while reporting invalid template error

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117351

Bug ID: 117351
   Summary: ICE while reporting invalid template error
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

While trying to reduce another problem I hit this: 

foo.cc:

template <_Lp> struct __shared_ptr_access {
  template 
  using __esft_base_t decltype(__enable_shared_from_this_base());
  template  __esft_base_t<_Yp>


cc1plus foo.cc
gu.cc:1:11: error: ‘_Lp’ has not been declared
1 | template <_Lp> struct __shared_ptr_access {
  |   ^~~
gu.cc:3:23: error: expected ‘=’ before ‘decltype’ [-Wtemplate-body]
3 |   using __esft_base_t decltype(__enable_shared_from_this_base());
  |   ^~~~
gu.cc:3:32: error: there are no arguments to ‘__enable_shared_from_this_base’
that depend on a template parameter, so a declaration of
‘__enable_shared_from_this_base’ must be available [-Wtemplate-body]
3 |   using __esft_base_t decltype(__enable_shared_from_this_base());
  |^~
gu.cc:3:32: note: (if you use ‘-fpermissive’, G++ will accept your code, but
allowing the use of an undeclared name is deprecated)
gu.cc:3:32: error: there are no arguments to ‘__enable_shared_from_this_base’
that depend on a template parameter, so a declaration of
‘__enable_shared_from_this_base’ must be available [-Wtemplate-body]
gu.cc: In substitution of ‘template< >
template using __shared_ptr_access< >::__esft_base_t =
decltype (__enable_shared_from_this_base()) [with  =
_Yp;  = ]’:
gu.cc:4:44:   required from here
4 |   template  __esft_base_t<_Yp>
  |^
gu.cc:3:62: error: ‘__enable_shared_from_this_base’ was not declared in this
scope
3 |   using __esft_base_t decltype(__enable_shared_from_this_base());
  |~~^~
gu.cc:4:44: internal compiler error: Segmentation fault
4 |   template  __esft_base_t<_Yp>
  |^
0x27adb7f internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:518
0x135ed66 crash_signal
../../gcc/gcc/toplev.cc:323
0x9fe1b7 tree_class_check(tree_node*, tree_code_class, char const*, int, char
const*)
../../gcc/gcc/tree.h:3797
0x9fe1b7 pop_nested_class()
../../gcc/gcc/cp/class.cc:8636
0xc322d9 instantiate_template(tree_node*, tree_node*, int)
../../gcc/gcc/cp/pt.cc:22304
0xc33c8a instantiate_alias_template(tree_node*, tree_node*, int) [clone
.part.0] [clone .lto_priv.0]
../../gcc/gcc/cp/pt.cc:22400
0xc07640 instantiate_alias_template
../../gcc/gcc/cp/pt.cc:22378
0xc07640 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*,
int)
../../gcc/gcc/cp/pt.cc:10177
0xc4c037 finish_template_type(tree_node*, tree_node*, int)
../../gcc/gcc/cp/semantics.cc:4144
0xb92ced cp_parser_template_id
../../gcc/gcc/cp/parser.cc:19301
0xb92f65 cp_parser_class_name
../../gcc/gcc/cp/parser.cc:26973
0xb97d12 cp_parser_qualifying_entity
../../gcc/gcc/cp/parser.cc:7438
0xb97d12 cp_parser_nested_name_specifier_opt
../../gcc/gcc/cp/parser.cc:7124
0xbacdb4 cp_parser_constructor_declarator_p
../../gcc/gcc/cp/parser.cc:32800
0xbacdb4 cp_parser_decl_specifier_seq
../../gcc/gcc/cp/parser.cc:16872
0xbb7068 cp_parser_single_declaration
../../gcc/gcc/cp/parser.cc:33467
0xbb7662 cp_parser_template_declaration_after_parameters
../../gcc/gcc/cp/parser.cc:33228
0xbb7662 cp_parser_explicit_template_declaration
../../gcc/gcc/cp/parser.cc:33398
0xb86e10 cp_parser_member_specification_opt
../../gcc/gcc/cp/parser.cc:28187
0xb86e10 cp_parser_class_specifier
../../gcc/gcc/cp/parser.cc:27166
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug bootstrap/117350] ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #3 from ak at gcc dot gnu.org ---
Reduced test case for an Intel platform:

gu.cc:
template  class tuple;
template  struct tuple<_T1, _T2> {
  tuple(_T1, _T2);
};
struct __uniq_ptr_impl {
  __uniq_ptr_impl(int __p, int) : _M_t(__p, int()) {}
  tuple _M_t;
};
struct __uniq_ptr_data : __uniq_ptr_impl {
  __uniq_ptr_impl::__uniq_ptr_impl;
};
template  struct unique_ptr {
  __uniq_ptr_data _M_t;
  template 
  unique_ptr(unique_ptr<_Up, _Ep>) : _M_t(0, _Ep()) {}
};
template  unique_ptr make_unique();
namespace {
struct gcc_urlifier;
}
unique_ptr make_gcc_urlifier() { return make_unique(); }


perf record -c 10003 -b -e br_inst_retired.all_branches:pu ./cc1plus  -O2 
-flto=jobserver   gu.cc
create_gcov -binary cc1plus --gcov_version 2
cc1plus  -O2  -flto=jobserver  -fauto-profile=fbdata.afdo gu.cc


(note may need to change the perf event name on other CPUs, see perf list
branch)

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #19 from ak at gcc dot gnu.org ---
Fixed for the switch part. There is still a problem with late combine, this is
tracked in PR117297

[Bug middle-end/117352] New: switch bit test conversion makes comparison code worse

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117352

Bug ID: 117352
   Summary: switch bit test conversion makes comparison code worse
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

With the change in PR117091 that makes switch bit test conversion more
aggressive
I see a failure in gcc.dg/pr21643.c which checks for tree reassoc happening.
I fixed the the test by using -fno-bit-tests.

However this makes the generated code on aarch64 worse:

int
f1 (unsigned char c)
{
  if (c == 0x22 || c == 0x20 || c < 0x20)
return 1;
  return 0;
}

Before (with -fno-bit-tests or without PR117091 change)

f1:
.LFB0:
and w0, w0, 255
mov w1, 32
cmp w0, 34
ccmpw0, w1, 0, ne
csetw0, ls
ret


After:

f1:
.LFB0:
and w0, w0, 255
mov x1, -281449206906881
movkx1, 0x0, lsl 48
cmp w0, 35
lsr x0, x1, x0
and w0, w0, 1
cselw0, w0, wzr, cc
ret

So I guess tree-reassoc needs to learn to handle bit test switch code better?

[Bug bootstrap/117350] ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #5 from ak at gcc dot gnu.org ---
Also the ICE had a truncated backtrace. Checking it in gdb gives the full one.
The bad mangling happens while autofdo is reading the string table of the afdo
file, and trying to generate the asm name of an arbitrary decl.


#0  write_unscoped_name (decl=) at
../../gcc/gcc/cp/mangle.cc:1197
#1  0x00b23797 in write_unscoped_template_name (decl=)
at ../../gcc/gcc/cp/mangle.cc:1215
#2  write_name (decl=,
ignore_local_scope=) at ../../gcc/gcc/cp/mangle.cc:1122
#3  0x00b24a2a in write_encoding (decl=) at ../../gcc/gcc/cp/mangle.cc:939
#4  0x00b24c0a in write_mangled_name (decl=decl@entry=, top_level=top_level@entry=true)
at ../../gcc/gcc/cp/mangle.cc:821
#5  0x00b2cba4 in mangle_decl_string (decl=decl@entry=)
at ../../gcc/gcc/cp/mangle.cc:4428
#6  0x00b2cda8 in get_mangled_id (decl=) at ../../gcc/gcc/cp/mangle.cc:4449
#7  mangle_decl (decl=) at
../../gcc/gcc/cp/mangle.cc:4487
#8  0x016ad32b in decl_assembler_name (decl=) at ../../gcc/gcc/tree.cc:728
#9  0x024e1007 in autofdo::string_table::get_index_by_decl
(this=0x34c0b9c0, decl=)

[Bug bootstrap/117350] ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|1   |0
 Status|WAITING |UNCONFIRMED
 CC||jason at redhat dot com

--- Comment #4 from ak at gcc dot gnu.org ---
This is the originally failing assert

1194  /* If not, it should be either in the global namespace, or
directly
1195 in a local function scope.  A lambda can also be mangled in
the
1196 scope of a default argument.  */
1197  gcc_assert (context == global_namespace
1198  || TREE_CODE (context) == PARM_DECL
1199  || TREE_CODE (context) == FUNCTION_DECL);

context is 

 constant 8>
unit-size  constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7f17c25dd930
fields  unit-size

align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7f17c25d42a0 fields  context

pointer_to_this  reference_to_this
>
used nonlocal decl_3 QI gu.cc:13:19 size 
unit-size 
align:8 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset 
bit-offset  context > context 
pointer_to_this  reference_to_this
>

[Bug other/117350] New: ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

Bug ID: 117350
   Summary: ICE in pretty print during bootstrap
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

With --with-build-config=bootstrap-lto make autoprofiledbootstrap

I get

/home/ak/gcc/obj-auto/./prev-gcc/xg++ -B/home/ak/gcc/obj-auto/./prev-gcc/
-B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++
-B/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs
-B/home/ak/gcc/obj-auto/prev-x86_64-
pc-linux-gnu/libstdc++-v3/libsupc++/.libs 
-I/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
 -I/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include 
-I/home/ak/gcc/gcc/libstdc+
+-v3/libsupc++
-L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs
-L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs 
-fno-PIE -c   -g -O2 -fchecking=1 -flto=jobserver -frandom-seed=1
-DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-error=narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
-Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long
-Wno-var
iadic-macros -Wno-overlength-strings  -DHAVE_CONFIG_H -fno-PIE
-fauto-profile=cc1plus.fda -fauto-profile=cc1plus.fda -I. -I. -I../../gcc/gcc
-I../../gcc/gcc/. -I../../gcc/gcc/../include  -I../../gcc/gcc/../libcpp/include
-I../../gcc
/gcc/../libcody  -I../../gcc/gcc/../libdecnumber
-I../../gcc/gcc/../libdecnumber/bid -I../libdecnumber
-I../../gcc/gcc/../libbacktrace   -o gcc-urlifier.o -MT gcc-urlifier.o -MMD -MP
-MF ./.deps/gcc-urlifier.TPo ../../gcc/gcc/gcc-ur
lifier.cc
during GIMPLE pass: einline

In file included from
/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:37,
 from
/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/memory:80,
 from ../../gcc/gcc/system.h:766,
 from ../../gcc/gcc/gcc-urlifier.cc:23:
in format_phase_2, at pretty-print.cc:2162
 2088 | tuple()
  | ^
0x27adb7f internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:518
0x9ba31b fancy_abort(char const*, int, char const*)
../../gcc/gcc/diagnostic.cc:1580
0x9bd456 format_phase_2
../../gcc/gcc/pretty-print.cc:2162
0x9bd456 pretty_printer::format(text_info&)
../../gcc/gcc/pretty-print.cc:1712
0x282b62e pp_format(pretty_printer*, text_info*)
../../gcc/gcc/pretty-print.h:602
0x282b62e pp_format_verbatim(pretty_printer*, text_info*)
../../gcc/gcc/pretty-print.cc:2340
0x282b62e pp_verbatim(pretty_printer*, char const*, ...)
../../gcc/gcc/pretty-print.cc:2619
0xadef66 print_instantiation_full_context
../../gcc/gcc/cp/error.cc:3855
0xadef66 maybe_print_instantiation_context
../../gcc/gcc/cp/error.cc:4010
0xadef66 maybe_print_instantiation_context
../../gcc/gcc/cp/error.cc:4004
0x13c5e59 default_tree_diagnostic_text_starter
../../gcc/gcc/tree-diagnostic.cc:52
0x27ab84f diagnostic_text_output_format::on_report_diagnostic(diagnostic_info
const&, diagnostic_t)
../../gcc/gcc/diagnostic-format-text.cc:207
0x27acf66 diagnostic_context::report_diagnostic(diagnostic_info*)
../../gcc/gcc/diagnostic.cc:1357
0x27ad2ce diagnostic_context::diagnostic_impl(rich_location*,
diagnostic_metadata const*, diagnostic_option_id, char const*, __va_list_tag
(*) [1], diagnostic_t)
../../gcc/gcc/diagnostic.cc:1472
0x27adb7f internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:518
0x9ba31b fancy_abort(char const*, int, char const*)
../../gcc/gcc/diagnostic.cc:1580
0x74fbd5 write_unscoped_name
../../gcc/gcc/cp/mangle.cc:1197
0xb23796 write_unscoped_template_name
../../gcc/gcc/cp/mangle.cc:1215
0xb23796 write_name
../../gcc/gcc/cp/mangle.cc:1122
0xb24a29 write_encoding
../../gcc/gcc/cp/mangle.cc:939
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.

[Bug bootstrap/117350] ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #8 from ak at gcc dot gnu.org ---
It's when reading the profile file, so stage 4 (?)

The full log is here: http://firstfloor.org/~andi/l2

[Bug bootstrap/117350] ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #10 from ak at gcc dot gnu.org ---
The small test case also fails with gcc 13.0 (although it doesn't have the
nested ICE). So it's an old latent bug.

gcc version 13.3.1 20240913 (Red Hat 13.3.1-3) (GCC)

gcc -fauto-profile=fbdata.afdo gu.cc -O2 -flto
gu.cc:10:3: warning: access declarations are deprecated in favour of
using-declarations; suggestion: add the ‘using’ keyword [-Wdeprecated]
   10 |   __uniq_ptr_impl::__uniq_ptr_impl;
  |   ^~~
gu.cc:17:37: warning: ‘unique_ptr make_unique() [with T =
{anonymous}::gcc_urlifier]’ used but never defined
   17 | template  unique_ptr make_unique();
  | ^~~
during GIMPLE pass: einline
‘
in pp_format, at pretty-print.cc:1478
   15 |   unique_ptr(unique_ptr<_Up, _Ep>) : _M_t(0, _Ep()) {}
  |   ^~
Please submit a full bug report, with preprocessed source.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/cc3ZB46l.out file, please attach this to
your bugreport.

[Bug bootstrap/117350] ICE in pretty print during bootstrap

2024-10-29 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #11 from ak at gcc dot gnu.org ---

Given that it reproduce with distribution gcc 13.0 I don't think it's a
miscompilation.

[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea

2024-10-30 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #15 from ak at gcc dot gnu.org ---
I guess to debug have to figure what's different about the decl between the non
autofdo case and autofdo.

I tried to work around it by modifying the urlifier code to avoid the anonymous
name space, but it hits a similar bug later in gimple-range-fold.cc. Here is a
full build log of that attempt: http://firstfloor.org/~andi/l

/home/ak/gcc/obj-auto/./prev-gcc/xg++ -B/home/ak/gcc/obj-auto/./prev-gcc/
-B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/home/a
k/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs
-B/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc+
+/.libs 
-I/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
 -I/home/ak/gcc/obj-auto/prev-x86_
64-pc-linux-gnu/libstdc++-v3/include  -I/home/ak/gcc/gcc/libstdc++-v3/libsupc++
-L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libs
tdc++-v3/src/.libs
-L/home/ak/gcc/obj-auto/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs 
-fno-PIE -c   -g -O2 -fchecking=1 -
flto=jobserver -frandom-seed=1 -DIN_GCC-fno-exceptions -fno-rtti
-fasynchronous-unwind-tables -W -Wall -Wno-error=narrowing -Wwri
te-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variad
ic-macros -Wno-overlength-strings  -DHAVE_CONFIG_H -fno-PIE
-fauto-profile=cc1plus.fda -I. -I. -I../../gcc/gcc -I../../gcc/gcc/.
-I../../gcc/gcc/../include  -I../../gcc/gcc/../libcpp/include
-I../../gcc/gcc/../libcody  -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc
/../libdecnumber/bid -I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o
gimple-range-fold.o -MT gimple-range-fold.o -MMD -MP -MF
./.deps/gimple-range-fold.TPo ../../gcc/gcc/gimple-range-fold.cc
0x27a4b3f internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:518
0x9bb2a5 fancy_abort(char const*, int, char const*)
../../gcc/gcc/diagnostic.cc:1693
0x9bdd90 format_phase_2
../../gcc/gcc/pretty-print.cc:2162
0x9bdd90 pretty_printer::format(text_info&)
../../gcc/gcc/pretty-print.cc:1712
0x282925e pp_format(pretty_printer*, text_info*)
../../gcc/gcc/pretty-print.h:602
0x282925e pp_format_verbatim(pretty_printer*, text_info*)
../../gcc/gcc/pretty-print.cc:2340
0x282925e pp_verbatim(pretty_printer*, char const*, ...)
../../gcc/gcc/pretty-print.cc:2619
0xae1ea3 print_instantiation_full_context
../../gcc/gcc/cp/error.cc:3807
0xae1ea3 maybe_print_instantiation_context
../../gcc/gcc/cp/error.cc:3962
0xae1ea3 maybe_print_instantiation_context 

../../gcc/gcc/cp/error.cc:3956
0x13adc56 default_tree_diagnostic_text_starter
../../gcc/gcc/tree-diagnostic.cc:52
0x27a2ba0 diagnostic_text_output_format::on_report_diagnostic(diagnostic_info
const&, diagnostic_t)
../../gcc/gcc/diagnostic-format-text.cc:210
0x27a3f62 diagnostic_context::report_diagnostic(diagnostic_info*)
0x27a434e diagnostic_context::diagnostic_impl(rich_location*,
diagnostic_metadata const*, diagnostic_option_id, char const*, __va_list_tag
(*) [1], diagnostic_t)
../../gcc/gcc/diagnostic.cc:1585
0x27a4b3f internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:518
0x9bb2a5 fancy_abort(char const*, int, char const*)
../../gcc/gcc/diagnostic.cc:1693
0x750805 write_unscoped_name
../../gcc/gcc/cp/mangle.cc:1197
0xb22496 write_unscoped_template_name
../../gcc/gcc/cp/mangle.cc:1215
0xb22496 write_name
../../gcc/gcc/cp/mangle.cc:1122
0xb23729 write_encoding
../../gcc/gcc/cp/mangle.cc:939
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea

2024-10-31 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #21 from ak at gcc dot gnu.org ---
Thanks. 

I'll see if this patch is enough:

diff --git a/gcc/tree.cc b/gcc/tree.cc
index b4c059d3b0db..92f99eaccd72 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -787,8 +787,9 @@ need_assembler_name_p (tree decl)
   || DECL_ASSEMBLER_NAME_SET_P (decl))
 return false;

-  /* Abstract decls do not need an assembler name.  */
-  if (DECL_ABSTRACT_P (decl))
+  /* Abstract decls do not need an assembler name, except they
+ can be looked up by autofdo.  */
+  if (DECL_ABSTRACT_P (decl) && !flag_auto_profile)
 return false;

   /* For VAR_DECLs, only static, public and external symbols need an

[Bug target/117312] RFE: x86 (and perhaps others): inline assembly: "red-zone" clobber

2024-10-27 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117312

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #3 from ak at gcc dot gnu.org ---
This must be hit in lots of application code using inline asm? I wonder why
noone complained.

[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea

2024-10-31 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #17 from ak at gcc dot gnu.org ---
http://firstfloor.org/~andi/fbdata.afdo is the gcov file for the reproducer
above.

[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea

2024-10-31 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #16 from ak at gcc dot gnu.org ---
I'm not sure the revision in the subject is right. Given the reproduction in
gcc 13 it seems to me this is a latent bug that is just triggered by changes in
the bootstrapped input source. Strangely it is now triggered by at least two
places, so something else might have changed.

The initial failure comes from this assert failing

1194  /* If not, it should be either in the global namespace, or
directly
1195 in a local function scope.  A lambda can also be mangled in
the
1196 scope of a default argument.  */
1197  gcc_assert (context == global_namespace
1198  || TREE_CODE (context) == PARM_DECL
1199  || TREE_CODE (context) == FUNCTION_DECL);

When i look at it in rr I see

(rr) p context
$5 = 
(rr) p decl
$7 = 

This doesn't look like garbage from freed data, more some logic problem.

[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea

2024-10-31 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

--- Comment #18 from ak at gcc dot gnu.org ---

Okay I looked into need_assembler_name_p. For __ct function_decl it bails out
due to


784   /* If DECL already has its assembler name set, it does not need a
785  new one.  */
786   if (!HAS_DECL_ASSEMBLER_NAME_P (decl)
787   || DECL_ASSEMBLER_NAME_SET_P (decl))
788 return false;

 >
QI
size 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fc0883d0930 method basetype 
arg-types 
chain 
chain 
chain >>>>
pointer_to_this >
used public abstract external QI ../gu.cc:3:3 align:16 warn_if_not_align:0
context  abstract_origin 
full-name "tuple<_T1, _T2>::tuple(_T1, _T2) [with _T1 = int; _T2 = int]"
template-info 
VOID ../gu.cc:3:3
align:1 warn_if_not_align:0 context  result 
parms 
value 
length:2
elt:0 >
elt:1 >>>
full-name "template tuple<_T1,
_T2>::tuple(_T1, _T2)">
args  elt:1 >
   pending_template>
use_template=1 chain >

I assume that means HAS_DECL_ASSEMBLER_NAME_P returns false.

[Bug rtl-optimization/117297] New: late combine undoes too much

2024-10-25 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117297

Bug ID: 117297
   Summary: late combine undoes too much
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

forked from PR117091

When the (admittedly extreme) test case from PR117091 is compiled with -O2
-fno-bit-tests -fno-jump-tables (to work around the switch scalability issues)
the compiler spends ~70% of the time in late combine doing

-   39.85%  cc1  cc1   [.] temporarily_undo_changes(int)   
◆
 39.84% temporarily_undo_changes(int)  
▒
rtl_ssa::insn_info::calculate_cost() const 
▒
  - rtl_ssa::changes_are_worthwhile(array_slice, bool) ▒
 - 31.12% (anonymous
namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*,
rtl_ssa::insn_info*)  ▒
  (anonymous namespace)::pass_late_combine::execute(function*) 
▒
  execute_one_pass(opt_pass*)  
▒
  execute_pass_list_1(opt_pass*)   
▒
  execute_pass_list_1(opt_pass*)   
▒
 - 8.72% (anonymous namespace)::pass_late_combine::execute(function*)  
▒
  execute_one_pass(opt_pass*)  
▒
  execute_pass_list_1(opt_pass*)   
▒
  execute_pass_list_1(opt_pass*)   
▒
-   32.56%  cc1  cc1   [.] redo_changes(int)   
▒
 32.56% rtl_ssa::insn_info::calculate_cost() const 
▒
  - rtl_ssa::changes_are_worthwhile(array_slice, bool) ▒
 - 25.50% (anonymous
namespace)::late_combine::combine_into_uses(rtl_ssa::insn_info*,
rtl_ssa::insn_info*)  ▒
  (anonymous namespace)::pass_late_combine::execute(function*) 
▒
  execute_one_pass(opt_pass*)  
▒
  execute_pass_list_1(opt_pass*)   
▒ 
execute_pass_list_1(opt_pass*) 
  ▒ - 7.06% (anonymous
namespace)::pass_late_combine::execute(function*)  
▒  execute_one_pass(opt_pass*) 
 ▒ 
execute_pass_list_1(opt_pass*) 
  ▒ 
execute_pass_list_1(opt_pass*) 
  ▒

[Bug ipa/117350] [15 Regression] ICE with autoprofiledbootstrap and bootstrap-lto after r15-4610-gbf43fe6aa966ea

2024-11-18 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117350

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #25 from ak at gcc dot gnu.org ---
Patch is committed

[Bug c++/118277] g++ ICEs with depedent inline-asm string

2025-01-02 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118277

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #4 from ak at gcc dot gnu.org ---
Most likely it's latent, asm constexpr just reuses the existing constexpr
machinery.

5271static tree
5272initialized_type (tree t)
5273{
5274  if (TYPE_P (t))
5275return t;
5276  tree type = TREE_TYPE (t);
5277  if (TREE_CODE (t) == CALL_EXPR)
5278{
(rr)
5279  /* A constructor call has void type, so we need to look deeper. 
*/
5280  tree fn = get_function_named_in_call (t);
5281  if (fn && TREE_CODE (fn) == FUNCTION_DECL
5282  && DECL_CXX_CONSTRUCTOR_P (fn))
5283type = DECL_CONTEXT (fn);
5284}
5285  else if (TREE_CODE (t) == COMPOUND_EXPR)
5286return initialized_type (TREE_OPERAND (t, 1));
5287  else if (TREE_CODE (t) == AGGR_INIT_EXPR)
5288type = TREE_TYPE (AGGR_INIT_EXPR_SLOT (t));
(rr)
5289  return cv_unqualified (type);
5290}

but t is an unexpected scope_ref with no type which is not handled:

(rr) pt t
 
template-info 
args 
readonly constant decl 
   index 0 level 1 orig_level 1>>>
full-name "struct to_str"
no-binfo use_template=1 interface-unknown
chain >
arg:1 >
../../tsrc/conste.cc:5:23 start: ../../tsrc/conste.cc:5:23 finish:
../../tsrc/conste.cc:5:27>

[Bug tree-optimization/118279] gcc fails to eliminate unnecessary guards around switch()

2025-01-02 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118279

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org,
   ||amacleod at redhat dot com

--- Comment #3 from ak at gcc dot gnu.org ---
Seems like a ranger issue?

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2024-12-25 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

ak at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||ak at gcc dot gnu.org

[Bug testsuite/117961] x86 testsuite: scan-assembler[-not] is bogus for inline asm

2024-12-10 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117961

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #7 from ak at gcc dot gnu.org ---
i suppose scan-assembler could just ignore lines starting with #

[Bug tree-optimization/118443] New: [Meta bug] Bugs triggered by and blocking more smtgcc testing

2025-01-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118443

Bug ID: 118443
   Summary: [Meta bug] Bugs triggered by and blocking more smtgcc
testing
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
Depends on: 113703, 117186, 117688, 118174
  Target Milestone: ---

Optimizations introducing undefined behavior.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703
[Bug 113703] ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117186
[Bug 117186] [12/13/14 Regression] aarch64 wrong code for (a < b) < (b < a)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117688
[Bug 117688] [15 Regression] RISC-V: Wrong code for .SAT_SUB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118174
[Bug 118174] [15 Regression] AArch64: Miscompilation at -O3 since
r15-5943-gdc0dea98c96e02

[Bug translation/40883] [meta-bug] Translation breakage with trivial fixes

2025-01-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
Bug 40883 depends on bug 80188, which changed state.

Bug 80188 Summary: calls.c: reason argument to maybe_complain_about_tail_call 
must be marked for translation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80188

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/118444] New: [Meta bug] musttail bugs

2025-01-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118444

Bug ID: 118444
   Summary: [Meta bug] musttail bugs
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
Depends on: 115606, 115979, 116080, 116545, 118430, 118442
  Target Milestone: ---

Issues with the new musttail attribute.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115606
[Bug 115606] C++ front-end marks the return slot as addressable early on which
prevents tail call being marked
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115979
[Bug 115979] Implicitly generated C++ calls stop musttail search early
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080
[Bug 116080] [15 regression] New tests from r15-2233-g8d1af8f904a0c0 fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116545
[Bug 116545] Support old style statement attributes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118430
[Bug 118430] [14/15 Regression] tail call vs IPA-VRP return value range with
constant value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118442
[Bug 118442] -fprofile-generate wrongly adds instrumentation after musttail
call

[Bug translation/80188] calls.c: reason argument to maybe_complain_about_tail_call must be marked for translation

2025-01-12 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80188

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from ak at gcc dot gnu.org ---
This has been fixed in gcc 15.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-13 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #10 from ak at gcc dot gnu.org ---
Okay it looks like the test case just avoids the if (...) return problem by
replacing it with if (...) break. I guess the vectorizer should really be able
to do that on its own.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-13 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #9 from ak at gcc dot gnu.org ---
On x86/avx512f the first variant still fails with 

earch-line-fast.c:4:60: missed: couldn't vectorize loop
search-line-fast.c:4:60: missed: not vectorized: number of iterations cannot be
computed.

and the second variant with end condition with

search-line-fast-cond.c:3:18: missed: couldn't vectorize loop
search-line-fast-cond.c:3:18: missed: not vectorized: unsupported control flow
in loop.
search-line-fast-cond.c:1:22: note: vectorized 0 loops in function.

The first needs some pattern matching: having the break condition in the loop
vs having it in a while header shouldn't matter.

I think the later is due to

vect_analyze_loop_form:
   
  |if (EDGE_COUNT (bbs[i]->succs) != 1


  [local count: 1044213920]:
  # prephitmp_25 = PHI <_24(4), 0(12)>
  _10 = _1 == 92;
  _13 = _10 | prephitmp_25;
  if (_13 != 0)
goto ; [8.03%]
  else
goto ; [91.97%]

   [local count: 83800315]:
  # s_19 = PHI 
  return s_19;

because the return isn't a jump out of the loop.
I'm not sure how arm avoids that problem.

[Bug c++/118277] g++ ICEs with depedent inline-asm string

2025-01-02 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118277

--- Comment #6 from ak at gcc dot gnu.org ---
Can you expand? None of the other callers of cp_parser_constant_expression seem
to do anything special for templates.

[Bug tree-optimization/118198] tail merge/cross jump should not merge abort

2025-01-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118198

ak at gcc dot gnu.org changed:

   What|Removed |Added

Summary|tail merge should not merge |tail merge/cross jump
   |abort   |should not merge abort
 Resolution|INVALID |---
 Status|RESOLVED|NEW

--- Comment #10 from ak at gcc dot gnu.org ---

cfgcleanup special cases sanitizer calls too. 

again the same could be done for __builtin_abort.
Probably both should use a common function to check.

  /* For address sanitizer, never crossjump __asan_report_* builtins,
 otherwise errors might be reported on incorrect lines.  */
  if (flag_sanitize & SANITIZE_ADDRESS)
{
  rtx call = get_call_rtx_from (i1);
  if (call && GET_CODE (XEXP (XEXP (call, 0), 0)) == SYMBOL_REF)
{
  rtx symbol = XEXP (XEXP (call, 0), 0);
  if (SYMBOL_REF_DECL (symbol)
  && TREE_CODE (SYMBOL_REF_DECL (symbol)) == FUNCTION_DECL)
{
  if ((DECL_BUILT_IN_CLASS (SYMBOL_REF_DECL (symbol))
   == BUILT_IN_NORMAL)
  && DECL_FUNCTION_CODE (SYMBOL_REF_DECL (symbol))
 >= BUILT_IN_ASAN_REPORT_LOAD1
  && DECL_FUNCTION_CODE (SYMBOL_REF_DECL (symbol))
 <= BUILT_IN_ASAN_STOREN)
return dir_none;
}
}
}

[Bug tree-optimization/118198] tail merge should not merge abort

2024-12-31 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118198

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org
 Ever confirmed|0   |1
  Component|debug   |tree-optimization
 Resolution|WONTFIX |---
 Status|RESOLVED|NEW
Summary|GCC wrong debug information |tail merge should not merge
   |bug |abort
   Last reconfirmed||2024-12-31

[Bug tree-optimization/118032] [15 regression] Bootstrap slowdown for risc-v

2024-12-23 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118032

--- Comment #29 from ak at gcc dot gnu.org ---
We could also implement greedy switch clustering for jump tables I think. Right
now it's only for the switch bitmap clustering.

[Bug target/118252] i386 should implement CASE_VECTOR_SHORTEN_MODE

2025-01-02 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118252

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #2 from ak at gcc dot gnu.org ---
I suspect most switches are small, so even with a safety factor of 2 or 4 it
would still be useful.

Or alternatively could push the decision  to the assembler with some .if, but
that would definitely be more code.

[Bug middle-end/118864] Add nomerge attribute

2025-02-14 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118864

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #3 from ak at gcc dot gnu.org ---
For cross jumping it would be in old_insns_match_p I think

FWIW I tried to write a patch there for noreturn, but it didn't fix
the original issue in PR118198, likely due to some other code
transformations

diff --git a/gcc/cfgcleanup.cc b/gcc/cfgcleanup.cc
index d28d2323191..b784d5eca7a 100644
--- a/gcc/cfgcleanup.cc
+++ b/gcc/cfgcleanup.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "rtl-iter.h"
 #include "regs.h"
+#include "calls.h"
 #include "function-abi.h"

 #define FORWARDER_BLOCK_P(BB) ((BB)->flags & BB_FORWARDER_BLOCK)
@@ -1207,6 +1208,11 @@ old_insns_match_p (int mode ATTRIBUTE_UNUSED, rtx_insn
*i1, rtx_insn *i2)
  || SIBLING_CALL_P (i1) != SIBLING_CALL_P (i2))
return dir_none;

+  /* Avoid merging noreturn to improve backtraces.  */
+  if (rtx call = get_call_rtx_from (i1);
+ call && find_reg_note (call, REG_NORETURN, NULL))
+   return dir_none;
+
   /* For address sanitizer, never crossjump __asan_report_* builtins,
 otherwise errors might be reported on incorrect lines.  */
   if (flag_sanitize & SANITIZE_ADDRESS)
diff --git a/gcc/tree-ssa-tail-merge.cc b/gcc/tree-ssa-tail-merge.cc
index d897970079c..fc23672930d 100644
--- a/gcc/tree-ssa-tail-merge.cc
+++ b/gcc/tree-ssa-tail-merge.cc
@@ -1312,6 +1312,10 @@ merge_stmts_p (gimple *stmt1, gimple *stmt2)
   if (lookup_stmt_eh_lp_fn (cfun, stmt1) != lookup_stmt_eh_lp_fn (cfun,
stmt2))
 return false;

+  /* Don't merge noreturn to give accurate backtraces.  */
+  if (is_gimple_call (stmt1) && (gimple_call_flags (stmt1) & ECF_NORETURN))
+return false;
+
   if (is_gimple_call (stmt1)
   && gimple_call_internal_p (stmt1))
 switch (gimple_call_internal_fn (stmt1))

[Bug gcov-profile/119375] Some autofdo test cases fail

2025-04-04 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119375

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #5 from ak at gcc dot gnu.org ---
Could someone bisect those failures please?

I will need installing the autofdo tools.

[Bug target/119628] Need better mechanisms to manage register saves in callee for tail calls

2025-04-05 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628

--- Comment #2 from ak at gcc dot gnu.org ---
The existing attributes could just handle this case?

[Bug c++/64500] push_to_top_level() shows up high during build of modern C++ code

2025-03-25 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500

--- Comment #9 from ak at gcc dot gnu.org ---
I can test it later, but it would surprise me if it helps. The problem is not
the computation but the misses. When profiling it I see a lot of cache misses
on "cmp" memory load. So likely need to do something about the data structure.

Looking at some LBR data the list walks just seem to be too long. Several of
the iterations exceeded the 32 entry limit of the Intel LBR. A 90+ cycle
latency must be multiple cache misses. I saw up to 340 cycles just for the loop
body.

e.g. here is an excerpt with cycle data

  01278705jnz 0x12786e0
  # PRED 74 cycles [74]
012786e0cmpw  $0x2, (%rbx)
012786e4jz 0x1278e20
012786eamovq  0x20(%rbx), %rbp
012786eetest %rbp, %rbp
012786f1jz 0x12786fe
012786f3cmpq  $0x0, 0x38(%rbp)
012786f8jnz 0x1278868
012786femovq  0x10(%rbx), %rbx
01278702test %rbx, %rbx
01278705jnz 0x12786e0  
# PRED 78 cycles [152] 0.13 IPC
012786e0cmpw  $0x2, (%rbx)
012786e4jz 0x1278e20
012786eamovq  0x20(%rbx), %rbp
012786eetest %rbp, %rbp
012786f1jz 0x12786fe
012786f3cmpq  $0x0, 0x38(%rbp)
012786f8jnz 0x1278868
012786femovq  0x10(%rbx), %rbx
01278702test %rbx, %rbx
01278705jnz 0x12786e0  
# PRED 356 cycles [508] 0.03 IPC
012786e0cmpw  $0x2, (%rbx)
012786e4jz 0x1278e20
012786eamovq  0x20(%rbx), %rbp
012786eetest %rbp, %rbp
012786f1jz 0x12786fe
012786f3cmpq  $0x0, 0x38(%rbp)
012786f8jnz 0x1278868
012786femovq  0x10(%rbx), %rbx
01278702test %rbx, %rbx
01278705jnz 0x12786e0  
# PRED 24 cycles [532] 0.42 IPC
012786e0cmpw  $0x2, (%rbx)
012786e4jz 0x1278e20
012786eamovq  0x20(%rbx), %rbp
012786eetest %rbp, %rbp
012786f1jz 0x12786fe
012786f3cmpq  $0x0, 0x38(%rbp)
012786f8jnz 0x1278868
012786femovq  0x10(%rbx), %rbx
01278702test %rbx, %rbx
01278705jnz 0x12786e0  
# PRED 94 cycles [626] 0.11 IPC
012786e0cmpw  $0x2, (%rbx)
012786e4jz 0x1278e20
012786eamovq  0x20(%rbx), %rbp
012786eetest %rbp, %rbp
012786f1jz 0x12786fe
012786f3cmpq  $0x0, 0x38(%rbp)
012786f8jnz 0x1278868
012786femovq  0x10(%rbx), %rbx
01278702test %rbx, %rbx
01278705jnz 0x12786e0  
# PRED 70 cycles [696] 0.14 IPC
 ...

[Bug tree-optimization/119482] New: slow compilation on

2025-03-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

Bug ID: 119482
   Summary: slow compilation on
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

Created attachment 60892
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60892&action=edit
input file

This is a file from the Ladybird browser. It uses flatten. With flatten gcc
compilation is a lot slower (40+s) vs clang (6s). The ladybird developers had
to disable it to not make the CI time out.

It doesn't look like a problem with the inliner, but the file just hitting
general scaling limits. The profile is still fairly flat, but the top hot
functions seem to be ranger and SSA related.

time g++-15 -ftime-report -std=gnu++20 -O2 interpreter.i  -S -w

Time variable  wall   GGC
 phase setup:   0.00 (  0%)  1952k (  0%)
 phase parsing  :   0.73 (  2%)   237M ( 25%)
 phase lang. deferred   :   0.28 (  1%)57M (  6%)
 phase opt and generate :  41.67 ( 98%)   651M ( 69%)
 |name lookup   :   0.12 (  0%)11M (  1%)
 |overload resolution   :   0.29 (  1%)64M (  7%)
 garbage collection :   0.39 (  1%) 0  (  0%)
 dump files :   0.02 (  0%) 0  (  0%)
 callgraph construction :   0.11 (  0%)21M (  2%)
 callgraph optimization :   0.18 (  0%)52k (  0%)
 callgraph functions expansion  :  36.11 ( 85%)   369M ( 39%)
 callgraph ipa passes   :   5.23 ( 12%)   234M ( 25%)
 ipa function summary   :   0.08 (  0%)  3583k (  0%)
 ipa dead code removal  :   0.02 (  0%) 0  (  0%)
 ipa cp :   0.11 (  0%)  3444k (  0%)
 ipa inlining heuristics:   0.19 (  0%)13M (  1%)
 ipa function splitting :   0.02 (  0%)   842k (  0%)
 ipa reference  :   0.01 (  0%) 0  (  0%)
 ipa pure const :   0.04 (  0%)32k (  0%)
 ipa icf:   0.04 (  0%)  2176  (  0%)
 ipa SRA:   0.04 (  0%)   738k (  0%)
 ipa modref :   0.03 (  0%)   793k (  0%)
 cfg construction   :   0.02 (  0%)  2180k (  0%)
 cfg cleanup:   0.78 (  2%)  5539k (  1%)
 trivially dead code:   0.12 (  0%) 0  (  0%)
 df scan insns  :   0.11 (  0%)20k (  0%)
 df reaching defs   :   1.37 (  3%) 0  (  0%)
 df live regs   :   3.91 (  9%) 0  (  0%)
 df live&initialized regs   :   4.46 ( 10%) 0  (  0%)
 df must-initialized regs   :   0.02 (  0%) 0  (  0%)
 df use-def / def-use chains:   0.20 (  0%) 0  (  0%)
 df live reg subwords   :   0.04 (  0%) 0  (  0%)
 df reg dead/unused notes   :   0.61 (  1%)  5459k (  1%)
 register information   :   0.20 (  0%) 0  (  0%)
 alias analysis :   0.26 (  1%)10M (  1%)
 alias stmt walking :   2.70 (  6%)  2275k (  0%)
 register scan  :   0.03 (  0%)   107k (  0%)
 rebuild jump labels:   0.07 (  0%) 0  (  0%)
 preprocessing  :   0.03 (  0%)  1942k (  0%)
 parser (global):   0.07 (  0%)51M (  5%)
 parser struct body :   0.10 (  0%)31M (  3%)
 parser function body   :   0.07 (  0%)11M (  1%)
 parser inl. func. body :   0.04 (  0%)  7108k (  1%)
 parser inl. meth. body :   0.14 (  0%)34M (  4%)
 template instantiation :   0.52 (  1%)   150M ( 16%)
 constant expression evaluation :   0.03 (  0%)  3423k (  0%)
 constraint satisfaction:   0.02 (  0%)  2596k (  0%)
 early inlining heuristics  :   0.06 (  0%)  9725k (  1%)
 inline parameters  :   0.12 (  0%)  6552k (  1%)
 integration:   0.76 (  2%)   178M ( 19%)
 tree gimplify  :   0.08 (  0%)15M (  2%)
 tree eh:   0.07 (  0%)  6680k (  1%)
 tree CFG construction  :   0.02 (  0%)  8370k (  1%)
 tree CFG cleanup   :   1.04 (  2%)   600k (  0%)
 tree tail merge:   0.07 (  0%)  2549k (  0%)
 tree VRP   :   0.57 (  1%)  5979k (  1%)
 tree Early VRP :   0.34 (  1%)  7126k (  1%)
 tree copy propagation  :   0.20 (  0%)   111k (  0%)
 tree PTA   :   1.73 (  4%)  5483k (  1%)
 tree SSA rewrite   :   0.03 ( 

[Bug middle-end/119482] slow compilation on ladybird interpreter

2025-03-27 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

--- Comment #5 from ak at gcc dot gnu.org ---
Also I should add that the Ladybird developers report a 40% performance
improvement from adding flatten to clang.

[Bug c++/64500] push_to_top_level() shows up high during build of modern C++ code

2025-03-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500

--- Comment #11 from ak at gcc dot gnu.org ---
Okay it's not aliases just all the decls of the scope.

I think it would benefit from two lists, one list of marked decls, and another
of yet to mark decls. So that the already marked bindings don't need to be
re-walked.

[Bug c++/64500] push_to_top_level() shows up high during build of modern C++ code

2025-03-26 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500

--- Comment #10 from ak at gcc dot gnu.org ---
I misidentified the hot loop, it's actually this one in store_bindings:

for (t = names; t; t = TREE_CHAIN (t))
{
  if (TREE_CODE (t) == TREE_LIST)
id = TREE_PURPOSE (t);
  else
id = DECL_NAME (t);

  if (store_binding_p (id))
bindings_need_stored.safe_push (id);
}


So it's a list of aliases that can get long?

>From the LBR log store_binding_p is near always false.

Perhaps the list of ids that need to be stored can be cached?

[Bug middle-end/119482] slow compilation on ladybird interpreter

2025-03-27 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

--- Comment #3 from ak at gcc dot gnu.org ---
I ran a full comparison now. There is actually a significant regression between
g++-13 and g++-14, but -15 is roughly the same as -14. All are significantly
slower than clang:

clang++-19 -std=gnu++20 Interpreter.cpp -I ../../.. -I ../.. -w -S -o x.s -O2
ran
1.17 ± 0.19 times faster than clang++-18 -std=gnu++20 Interpreter.cpp -I
../../.. -I ../.. -w -S -o x.s -O2
5.10 ± 0.51 times faster than g++-13 -std=gnu++20 Interpreter.cpp -I
../../.. -I ../.. -w -S -o x.s -O2
5.91 ± 0.60 times faster than g++-15 -std=gnu++20 Interpreter.cpp -I
../../.. -I ../.. -w -S -o x.s -O2
6.15 ± 0.61 times faster than g++-14 -std=gnu++20 Interpreter.cpp -I
../../.. -I ../.. -w -S -o x.s -O2

For clang flatten just based on cfi_startproc gcc actually generates more
functions:

% grep -c cfi_startproc interpreter-clang.s 
570
% grep -c cfi_startproc interpreter-gcc.s 
610

but gcc indeed generates much more code:

   textdata bss dec hex filename
 3115911536   1  313128   4c728 interpreter-clang.o
 783346   8   2  783356   bf3fc interpreter-gcc.o

So yes there might be a difference in flatten semantics

I'm attaching a input file that works for clang if you want to look yourself.

[Bug middle-end/119482] slow compilation on ladybird interpreter

2025-03-27 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

--- Comment #4 from ak at gcc dot gnu.org ---
Created attachment 60902
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60902&action=edit
input file for clang testing

[Bug middle-end/119482] slow compilation on ladybird interpreter

2025-04-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

--- Comment #9 from ak at gcc dot gnu.org ---
For the ICE i'm not sure why i'm not seeing it. The input file should have had
flatten enabled.

[Bug c++/119387] [14/15 Regression] Regression in performance by a factor of 6 when building with debugging symbols since r14-5979

2025-04-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119387

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #15 from ak at gcc dot gnu.org ---
With the PR114563 alloc_page free list patch I get 

  ../obj-fast/gcc/cc1plus-allocpage -std=gnu++20 -O2 pr119387.cc  -quiet  ran
1.04 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2
pr119387.cc  -quiet
2.63 ± 0.01 times faster than ../obj-fast/gcc/cc1plus-allocpage
-std=gnu++20 -O2 pr119387.cc  -quiet -ggdb
2.78 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2
pr119387.cc  -quiet -ggdb

[Bug middle-end/114563] ggc_internal_alloc is slow

2025-04-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114563

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #13 from ak at gcc dot gnu.org ---

>so my idea was to have multiple freelists so that p->bytes == entry_size
>and this list walk, which is the bottleneck for PR119387 I think, is
>improved.

It should be improved because the first element will near always match.

I only kept the comparison for the fallback case: if there is no free list for
a given size so it puts the size into freelist[0]. But I'm not sure  this can
actually happen (for simple tests it never  triggers). If the fallback is
removed the comparison could be removed too, but it probably doesn't matter for
performance.

>Using your patch this changes to

>Samples: 1M of event 'cycles:Pu', Event count (approx.): 1053172130606
>   0.02%   234  cc1plus  cc1plus  [.] alloc_page(unsigned int)

>so the patch works as intended!

Great. I will submit it for phase 1 if I don't forget.

[Bug middle-end/114563] ggc_internal_alloc is slow

2025-04-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114563

--- Comment #14 from ak at gcc dot gnu.org ---
>to do this for entry_size < G.pagesize * GGC_QUIRE_SIZE, this should
>avoid fragmenting the virtual address space.  Possibly do this only
>for USING_MADVISE, not sure.

Okay let me test that.

[Bug middle-end/119482] slow compilation on ladybird interpreter

2025-04-01 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

--- Comment #8 from ak at gcc dot gnu.org ---
The workload does a lot of bitmap manipulations:

#
 5.62%  cc1plus  cc1plus   [.] bitmap_and_into(bitmap_head*,
bitmap_head const*)
 5.30%  cc1plus  cc1plus   [.]
bitmap_element_allocate(bitmap_head*)
 3.93%  cc1plus  cc1plus   [.] bitmap_ior_into(bitmap_head*,
bitmap_head const*)
 3.85%  cc1plus  cc1plus   [.]
bitmap_list_find_element(bitmap_head*, unsigned int)
 2.09%  cc1plus  cc1plus   [.] bitmap_and(bitmap_head*,
bitmap_head const*, bitmap_head const*)
 1.77%  cc1plus  cc1plus   [.] bitmap_elt_ior(bitmap_head*,
bitmap_element*, bitmap_element*, bitmap_element const*, bitmap_element const*>
 1.44%  cc1plus  cc1plus   [.] bitmap_set_bit(bitmap_head*,
int)
 1.43%  cc1plus  cc1plus   [.] bitmap_copy(bitmap_head*,
bitmap_head const*)
 1.41%  cc1plus  cc1plus   [.]
bitmap_ior_and_compl(bitmap_head*, bitmap_head const*, bitmap_head const*,
bitmap_head const*)
 1.33%  cc1plus  cc1plus   [.] bitmap_elt_copy(bitmap_head*,
bitmap_element*, bitmap_element*, bitmap_element const*, bool)


Looking at the samples most of it seem to be cache misses of some sort (working
set too big), but bitmap_set_bit stands out by having a misprediction

This simple patch improves runtime by 15%. Which is more than I expected given
it only has ~1.44% of the cycles, but I guess the mispredicts caused some down
stream effects.

  ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet ran
1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2
pr119482.cc  -quiet

Most callers of bitmap_set_bit don't need the return value, but with the
conditional store the CPU still has to predict it correctly since gcc doesn't
know how to do that without APX (even though CMOV could do it with a dummy
target). If we make the write unconditional the memory bandwidth increases, but
it is made up by less mispredictions.

>From the performance counter results it doesn't do much to the bandwidth,
but reduces the number of branches drastically. Even though the misprediction
rate goes up it is a lot less cycles wasted because of less branches.

$ perf stat -e
branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/
-a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc  -quiet -w

 Performance counter stats for 'system wide':

41,932,957,091  branches
   686,117,623  branch-misses#1.64% of all
branches
 43,690.47 MiB  uncore_imc/cas_count_read/
 12,362.56 MiB  uncore_imc/cas_count_write/

  49.328633365 seconds time elapsed

$ perf stat -e
branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/
-a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet -w

 Performance counter stats for 'system wide':

37,092,113,179  branches
   663,641,708  branch-misses#1.79% of all
branches
 43,196.52 MiB  uncore_imc/cas_count_read/
 12,369.33 MiB  uncore_imc/cas_count_write/

  42.632458350 seconds time elapsed

Patch:

diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index f5a64b495ab3..7744f8f8c2e4 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit)
   if (ptr != 0)
 {
   bool res = (ptr->bits[word_num] & bit_val) == 0;
-  if (res)
-   ptr->bits[word_num] |= bit_val;
+  ptr->bits[word_num] |= bit_val;
   return res;
 }

[Bug target/119628] New: Need better mechanisms to manage register saves in callee for tail calls

2025-04-04 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628

Bug ID: 119628
   Summary: Need better mechanisms to manage register saves in
callee for tail calls
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ak at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64

Created attachment 60997
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60997&action=edit
toy byte code interpreter

To compile the test case use

One use case for the musttail feature is to write threaded interpreters with
individual small functions each implementing an byte code and calling the next
function in the byte code program using musttail. 

This is a replacement for an older code style that put all these byte code
handlers into a large function and called them using indirect goto.

See the attached test case as an example.

This works fine for small functions that fit into the callee scratch registers
in the x86-64 ABI. But when you have more complex functions that need more
registers the individual functions starting saving/restoring the registers that
are supposed to be callee saved (this is simulated using inline asm in the test
case, thanks the Andrew Pinski for that trick)

You can see that in the case if you make the SAVE_REGS/DONT_SAVE_REGS empty,
there are lots of extra push/pops on each opcode.

Now this can be changed by modifying the calling convention as it's done in the
unmodified test case. The original caller of the byte code can save all and the
rest of the tail called byte code functions none. LLVM has
preserve_none/most/all for this and it is used in the field for this.

When the tail called functions are not called through pointers gcc has -fipa-ra
for static functions, which should take care of it. But unfortunately this only
works for direct calls because for indirects the IPA cgraph RTL mechanism
doesn't work.

gcc has no_callee_saved_registers/no_caller_saved_registers which was
originally developed for a different use case (fast interrupt handlers in OS)
but can modify the callee registers saving. The main drawback of them is that
they require -mgeneral-regs-only (as they were designed for an OS), which makes
it impossible to use floating point in the interpreter code. While this works
for the toy example it's probably a show stopper for real interpreters.

Another problem with them is that they don't affect the caller unlike the LLVM
attributes. Luckily for the tail call case the shrink wrapping code takes care
of this, although it's a problem if the byte code functions are called non tail
for some reason (e.g. in the first function of the interpreter), a well as for
other use cases (e.g. to use them to optimize calling of general cold
functions)

gcc should:
- support no_callee_saved_registers/no_caller_saved_registers without
-mgeneral-regs-only (there might be already bugs for this, but I'm filing it
separately to track the particular use case)
- figure out how -fipa-ra can be made to work for indirects? (maybe with some
type based analysis)
- Make the attributes affect the caller
- Do we need an equivalent of preserve_most
- Once no_callee/caller_saved_registers work similar to clang perhaps they
should be aliased for compatibility.

[Bug c++/64500] push_to_top_level() shows up high during Chromium build.

2025-03-23 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #4 from ak at gcc dot gnu.org ---
I see the same when building things like LLVM with a modern gcc.
push_to_top_level consistently uses 5-6% of the CPU time and is by far the most
expensive function of the compiler.

The hot comparison is the global_scope_p check below.

Need a better data structure?


  /* Have to include the global scope, because class-scope decls
 aren't listed anywhere useful.  */
  for (; b; b = b->level_chain)
{
  tree t;

  /* Template IDs are inserted into the global level. If they were
 inserted into namespace level, finish_file wouldn't find them
 when doing pending instantiations. Therefore, don't stop at
 namespace level, but continue until :: .  */
  if (global_scope_p (b))
break;

[Bug c++/64500] push_to_top_level() shows up high during Chromium build.

2025-03-23 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64500

ak at gcc dot gnu.org changed:

   What|Removed |Added

Version|5.0 |14.0
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-03-24

  1   2   >