https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
--- Comment #96 from Richard Biener <rguenth at gcc dot gnu.org> ---
The full testcase on trunk (g:95d94b52ea8478334fb92cca545f0bd904bd0034) at -O0
-g
now takes 9s to compile and uses 1GB ram.
With -O1 -g we have
Time variable usr sys wall
GGC
callgraph functions expansion : 13.41 ( 12%) 0.21 ( 60%) 13.63 ( 12%)
439M ( 73%)
callgraph ipa passes : 94.79 ( 86%) 0.13 ( 37%) 94.95 ( 86%)
75M ( 13%)
ipa function summary : 91.46 ( 83%) 0.02 ( 6%) 91.53 ( 83%)
17M ( 3%)
tree PTA : 5.78 ( 5%) 0.05 ( 14%) 5.85 ( 5%)
23M ( 4%)
TOTAL : 109.96 0.35 110.37
597M
109.97user 0.37system 1:50.38elapsed 99%CPU (0avgtext+0avgdata
1110568maxresident)k
0inputs+0outputs (0major+350549minor)pagefaults 0swaps
where perf shows
Samples: 448K of event 'cycles:u', Event count (approx.): 483237005145
Overhead Samples Command Shared Object Symbol
17.26% 77187 f951 f951 [.] get_ref_base_and_extent
#
8.36% 37385 f951 f951 [.]
stmt_may_clobber_ref_p_1 #
7.16% 32045 f951 f951 [.] default_binds_local_p_3
#
6.40% 28628 f951 f951 [.] bitmap_bit_p
#
6.39% 28557 f951 f951 [.]
determine_known_aggregate_parts #
5.92% 26464 f951 f951 [.] pt_solution_includes_1
#
4.66% 20834 f951 f951 [.]
call_may_clobber_ref_p_1 #
3.44% 15406 f951 f951 [.] flags_from_decl_or_type
#
3.35% 14971 f951 f951 [.] refs_may_alias_p_1
#
3.05% 13667 f951 f951 [.] gimple_call_flags
#
2.55% 11387 f951 f951 [.]
cgraph_node::get_availability #
2.40% 10739 f951 libc-2.26.so [.] __strncmp_sse42
#
2.32% 10372 f951 f951 [.] check_fnspec
#
1.89% 8411 f951 f951 [.] bitmap_set_bit
#
1.71% 7635 f951 f951 [.]
private_lookup_attribute #
1.68% 7512 f951 f951 [.]
get_modref_function_summary #
1.52% 6805 f951 f951 [.]
decl_binds_to_current_def_p #
1.46% 6512 f951 f951 [.] gimple_call_fnspec
#
1.26% 5582 f951 f951 [.] bitmap_clear_bit
#
0.94% 4212 f951 f951 [.]
cgraph_node::function_or_virtual_thunk_symbol
we need to do sth about the IPA fnsummary cost, it looks unreasonable compared
to all the rest, at least for -O1. Cutting down --param ipa-max-aa-steps
doesn't seem to help but it looks accounting is simply broken.
And with -O2 or -O3 we have
Time variable usr sys wall
GGC
callgraph functions expansion : 201.23 ( 20%) 0.77 ( 46%) 202.05 ( 20%)
1230M ( 82%)
callgraph ipa passes : 807.58 ( 80%) 0.86 ( 52%) 808.75 ( 80%)
201M ( 13%)
ipa inlining heuristics : 40.25 ( 4%) 0.01 ( 1%) 40.24 ( 4%)
41M ( 3%)
alias stmt walking : 21.48 ( 2%) 0.20 ( 12%) 21.72 ( 2%)
601k ( 0%)
tree PTA : 788.36 ( 78%) 0.76 ( 46%) 789.43 ( 78%)
101M ( 7%)
tree slp vectorization : 13.97 ( 1%) 0.04 ( 2%) 14.01 ( 1%)
225M ( 15%)
expand vars : 92.66 ( 9%) 0.00 ( 0%) 92.72 ( 9%)
63M ( 4%)
TOTAL :1010.42 1.66 1012.46
1509M
1010.42user 1.73system 16:52.53elapsed 99%CPU (0avgtext+0avgdata
4764428maxresident)k
0inputs+0outputs (0major+1199966minor)pagefaults 0swaps
surprisingly the IPA fnsummary issue is -O1 only but maybe it's an accounting
issue. perf with callgraph points to (if I interpret correctly) the
determine_known_aggregate_parts function which, while accounting alias
queries done via get_continuation_for_phi, does not account those done
by walking the VDEF chain itself. I'm testing a fix.