https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
--- Comment #96 from Richard Biener <rguenth at gcc dot gnu.org> --- The full testcase on trunk (g:95d94b52ea8478334fb92cca545f0bd904bd0034) at -O0 -g now takes 9s to compile and uses 1GB ram. With -O1 -g we have Time variable usr sys wall GGC callgraph functions expansion : 13.41 ( 12%) 0.21 ( 60%) 13.63 ( 12%) 439M ( 73%) callgraph ipa passes : 94.79 ( 86%) 0.13 ( 37%) 94.95 ( 86%) 75M ( 13%) ipa function summary : 91.46 ( 83%) 0.02 ( 6%) 91.53 ( 83%) 17M ( 3%) tree PTA : 5.78 ( 5%) 0.05 ( 14%) 5.85 ( 5%) 23M ( 4%) TOTAL : 109.96 0.35 110.37 597M 109.97user 0.37system 1:50.38elapsed 99%CPU (0avgtext+0avgdata 1110568maxresident)k 0inputs+0outputs (0major+350549minor)pagefaults 0swaps where perf shows Samples: 448K of event 'cycles:u', Event count (approx.): 483237005145 Overhead Samples Command Shared Object Symbol 17.26% 77187 f951 f951 [.] get_ref_base_and_extent # 8.36% 37385 f951 f951 [.] stmt_may_clobber_ref_p_1 # 7.16% 32045 f951 f951 [.] default_binds_local_p_3 # 6.40% 28628 f951 f951 [.] bitmap_bit_p # 6.39% 28557 f951 f951 [.] determine_known_aggregate_parts # 5.92% 26464 f951 f951 [.] pt_solution_includes_1 # 4.66% 20834 f951 f951 [.] call_may_clobber_ref_p_1 # 3.44% 15406 f951 f951 [.] flags_from_decl_or_type # 3.35% 14971 f951 f951 [.] refs_may_alias_p_1 # 3.05% 13667 f951 f951 [.] gimple_call_flags # 2.55% 11387 f951 f951 [.] cgraph_node::get_availability # 2.40% 10739 f951 libc-2.26.so [.] __strncmp_sse42 # 2.32% 10372 f951 f951 [.] check_fnspec # 1.89% 8411 f951 f951 [.] bitmap_set_bit # 1.71% 7635 f951 f951 [.] private_lookup_attribute # 1.68% 7512 f951 f951 [.] get_modref_function_summary # 1.52% 6805 f951 f951 [.] decl_binds_to_current_def_p # 1.46% 6512 f951 f951 [.] gimple_call_fnspec # 1.26% 5582 f951 f951 [.] bitmap_clear_bit # 0.94% 4212 f951 f951 [.] cgraph_node::function_or_virtual_thunk_symbol we need to do sth about the IPA fnsummary cost, it looks unreasonable compared to all the rest, at least for -O1. Cutting down --param ipa-max-aa-steps doesn't seem to help but it looks accounting is simply broken. And with -O2 or -O3 we have Time variable usr sys wall GGC callgraph functions expansion : 201.23 ( 20%) 0.77 ( 46%) 202.05 ( 20%) 1230M ( 82%) callgraph ipa passes : 807.58 ( 80%) 0.86 ( 52%) 808.75 ( 80%) 201M ( 13%) ipa inlining heuristics : 40.25 ( 4%) 0.01 ( 1%) 40.24 ( 4%) 41M ( 3%) alias stmt walking : 21.48 ( 2%) 0.20 ( 12%) 21.72 ( 2%) 601k ( 0%) tree PTA : 788.36 ( 78%) 0.76 ( 46%) 789.43 ( 78%) 101M ( 7%) tree slp vectorization : 13.97 ( 1%) 0.04 ( 2%) 14.01 ( 1%) 225M ( 15%) expand vars : 92.66 ( 9%) 0.00 ( 0%) 92.72 ( 9%) 63M ( 4%) TOTAL :1010.42 1.66 1012.46 1509M 1010.42user 1.73system 16:52.53elapsed 99%CPU (0avgtext+0avgdata 4764428maxresident)k 0inputs+0outputs (0major+1199966minor)pagefaults 0swaps surprisingly the IPA fnsummary issue is -O1 only but maybe it's an accounting issue. perf with callgraph points to (if I interpret correctly) the determine_known_aggregate_parts function which, while accounting alias queries done via get_continuation_for_phi, does not account those done by walking the VDEF chain itself. I'm testing a fix.