https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482
Bug ID: 119482 Summary: slow compilation on Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ak at gcc dot gnu.org Target Milestone: --- Created attachment 60892 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60892&action=edit input file This is a file from the Ladybird browser. It uses flatten. With flatten gcc compilation is a lot slower (40+s) vs clang (6s). The ladybird developers had to disable it to not make the CI time out. It doesn't look like a problem with the inliner, but the file just hitting general scaling limits. The profile is still fairly flat, but the top hot functions seem to be ranger and SSA related. time g++-15 -ftime-report -std=gnu++20 -O2 interpreter.i -S -w Time variable wall GGC phase setup : 0.00 ( 0%) 1952k ( 0%) phase parsing : 0.73 ( 2%) 237M ( 25%) phase lang. deferred : 0.28 ( 1%) 57M ( 6%) phase opt and generate : 41.67 ( 98%) 651M ( 69%) |name lookup : 0.12 ( 0%) 11M ( 1%) |overload resolution : 0.29 ( 1%) 64M ( 7%) garbage collection : 0.39 ( 1%) 0 ( 0%) dump files : 0.02 ( 0%) 0 ( 0%) callgraph construction : 0.11 ( 0%) 21M ( 2%) callgraph optimization : 0.18 ( 0%) 52k ( 0%) callgraph functions expansion : 36.11 ( 85%) 369M ( 39%) callgraph ipa passes : 5.23 ( 12%) 234M ( 25%) ipa function summary : 0.08 ( 0%) 3583k ( 0%) ipa dead code removal : 0.02 ( 0%) 0 ( 0%) ipa cp : 0.11 ( 0%) 3444k ( 0%) ipa inlining heuristics : 0.19 ( 0%) 13M ( 1%) ipa function splitting : 0.02 ( 0%) 842k ( 0%) ipa reference : 0.01 ( 0%) 0 ( 0%) ipa pure const : 0.04 ( 0%) 32k ( 0%) ipa icf : 0.04 ( 0%) 2176 ( 0%) ipa SRA : 0.04 ( 0%) 738k ( 0%) ipa modref : 0.03 ( 0%) 793k ( 0%) cfg construction : 0.02 ( 0%) 2180k ( 0%) cfg cleanup : 0.78 ( 2%) 5539k ( 1%) trivially dead code : 0.12 ( 0%) 0 ( 0%) df scan insns : 0.11 ( 0%) 20k ( 0%) df reaching defs : 1.37 ( 3%) 0 ( 0%) df live regs : 3.91 ( 9%) 0 ( 0%) df live&initialized regs : 4.46 ( 10%) 0 ( 0%) df must-initialized regs : 0.02 ( 0%) 0 ( 0%) df use-def / def-use chains : 0.20 ( 0%) 0 ( 0%) df live reg subwords : 0.04 ( 0%) 0 ( 0%) df reg dead/unused notes : 0.61 ( 1%) 5459k ( 1%) register information : 0.20 ( 0%) 0 ( 0%) alias analysis : 0.26 ( 1%) 10M ( 1%) alias stmt walking : 2.70 ( 6%) 2275k ( 0%) register scan : 0.03 ( 0%) 107k ( 0%) rebuild jump labels : 0.07 ( 0%) 0 ( 0%) preprocessing : 0.03 ( 0%) 1942k ( 0%) parser (global) : 0.07 ( 0%) 51M ( 5%) parser struct body : 0.10 ( 0%) 31M ( 3%) parser function body : 0.07 ( 0%) 11M ( 1%) parser inl. func. body : 0.04 ( 0%) 7108k ( 1%) parser inl. meth. body : 0.14 ( 0%) 34M ( 4%) template instantiation : 0.52 ( 1%) 150M ( 16%) constant expression evaluation : 0.03 ( 0%) 3423k ( 0%) constraint satisfaction : 0.02 ( 0%) 2596k ( 0%) early inlining heuristics : 0.06 ( 0%) 9725k ( 1%) inline parameters : 0.12 ( 0%) 6552k ( 1%) integration : 0.76 ( 2%) 178M ( 19%) tree gimplify : 0.08 ( 0%) 15M ( 2%) tree eh : 0.07 ( 0%) 6680k ( 1%) tree CFG construction : 0.02 ( 0%) 8370k ( 1%) tree CFG cleanup : 1.04 ( 2%) 600k ( 0%) tree tail merge : 0.07 ( 0%) 2549k ( 0%) tree VRP : 0.57 ( 1%) 5979k ( 1%) tree Early VRP : 0.34 ( 1%) 7126k ( 1%) tree copy propagation : 0.20 ( 0%) 111k ( 0%) tree PTA : 1.73 ( 4%) 5483k ( 1%) tree SSA rewrite : 0.03 ( 0%) 4929k ( 1%) tree SSA incremental : 0.88 ( 2%) 17M ( 2%) tree operand scan : 0.12 ( 0%) 24M ( 3%) dominator optimization : 0.87 ( 2%) 18M ( 2%) backwards jump threading : 0.53 ( 1%) 4951k ( 1%) tree SRA : 0.21 ( 1%) 11M ( 1%) isolate eroneous paths : 0.02 ( 0%) 2352 ( 0%) tree CCP : 0.51 ( 1%) 1631k ( 0%) tree split crit edges : 0.01 ( 0%) 3009k ( 0%) tree reassociation : 0.07 ( 0%) 168k ( 0%) tree PRE : 0.48 ( 1%) 12M ( 1%) tree FRE : 0.92 ( 2%) 6295k ( 1%) tree RPO VN : 0.07 ( 0%) 330k ( 0%) tree code sinking : 0.10 ( 0%) 6166k ( 1%) tree linearize phis : 0.08 ( 0%) 1636k ( 0%) tree backward propagate : 0.01 ( 0%) 0 ( 0%) tree forward propagate : 0.27 ( 1%) 2431k ( 0%) tree phiprop : 0.01 ( 0%) 0 ( 0%) tree conservative DCE : 0.18 ( 0%) 288k ( 0%) tree aggressive DCE : 0.15 ( 0%) 4133k ( 0%) tree DSE : 0.30 ( 1%) 4487k ( 0%) PHI merge : 0.01 ( 0%) 479k ( 0%) tree loop invariant motion : 0.10 ( 0%) 76k ( 0%) tree canonical iv : 0.01 ( 0%) 361k ( 0%) complete unrolling : 0.09 ( 0%) 974k ( 0%) tree slp vectorization : 0.20 ( 0%) 28M ( 3%) tree loop distribution : 0.01 ( 0%) 45k ( 0%) tree iv optimization : 0.04 ( 0%) 1761k ( 0%) tree copy headers : 0.09 ( 0%) 922k ( 0%) tree SSA uncprop : 0.03 ( 0%) 0 ( 0%) gimple widening/fma detection : 0.02 ( 0%) 278k ( 0%) tree strlen optimization : 0.06 ( 0%) 554k ( 0%) tree modref : 0.06 ( 0%) 2558k ( 0%) dominance frontiers : 0.09 ( 0%) 0 ( 0%) dominance computation : 0.92 ( 2%) 0 ( 0%) control dependences : 0.01 ( 0%) 0 ( 0%) out of ssa : 0.07 ( 0%) 181k ( 0%) expand vars : 0.23 ( 1%) 7554k ( 1%) expand : 0.27 ( 1%) 46M ( 5%) post expand cleanups : 0.05 ( 0%) 2218k ( 0%) lower subreg : 0.04 ( 0%) 1607k ( 0%) forward prop : 0.36 ( 1%) 682k ( 0%) CSE : 0.54 ( 1%) 2209k ( 0%) dead code elimination : 0.12 ( 0%) 0 ( 0%) dead store elim1 : 0.21 ( 0%) 3492k ( 0%) dead store elim2 : 0.17 ( 0%) 4180k ( 0%) loop init : 0.60 ( 1%) 11M ( 1%) loop invariant motion : 0.02 ( 0%) 7656 ( 0%) loop unrolling : 0.06 ( 0%) 214k ( 0%) loop fini : 0.02 ( 0%) 131k ( 0%) CPROP : 0.17 ( 0%) 3932k ( 0%) PRE : 0.08 ( 0%) 417k ( 0%) CSE 2 : 0.39 ( 1%) 1973k ( 0%) branch prediction : 0.10 ( 0%) 1896k ( 0%) combiner : 0.66 ( 2%) 15M ( 2%) late combiner : 0.35 ( 1%) 749k ( 0%) if-conversion : 0.10 ( 0%) 1905k ( 0%) integrated RA : 1.63 ( 4%) 30M ( 3%) LRA non-specific : 0.53 ( 1%) 2955k ( 0%) LRA virtuals elimination : 0.08 ( 0%) 2964k ( 0%) LRA reload inheritance : 0.02 ( 0%) 482k ( 0%) LRA create live ranges : 1.91 ( 4%) 369k ( 0%) LRA hard reg assignment : 0.03 ( 0%) 0 ( 0%) LRA rematerialization : 0.18 ( 0%) 736 ( 0%) reload CSE regs : 0.45 ( 1%) 6837k ( 1%) ree : 0.05 ( 0%) 48k ( 0%) thread pro- & epilogue : 0.12 ( 0%) 1256k ( 0%) if-conversion 2 : 0.04 ( 0%) 23k ( 0%) combine stack adjustments : 0.03 ( 0%) 0 ( 0%) peephole 2 : 0.07 ( 0%) 746k ( 0%) hard reg cprop : 0.12 ( 0%) 34k ( 0%) scheduling 2 : 1.00 ( 2%) 2445k ( 0%) machine dep reorg : 0.09 ( 0%) 6656 ( 0%) reorder blocks : 0.14 ( 0%) 3097k ( 0%) duplicate computed gotos : 0.01 ( 0%) 1565k ( 0%) shorten branches : 0.11 ( 0%) 432 ( 0%) final : 0.22 ( 1%) 10000k ( 1%) tree if-combine : 0.02 ( 0%) 249k ( 0%) if to switch conversion : 0.02 ( 0%) 1104 ( 0%) straight-line strength reduction : 0.02 ( 0%) 43k ( 0%) store merging : 0.05 ( 0%) 1623k ( 0%) access analysis : 0.10 ( 0%) 25k ( 0%) ext dce : 0.03 ( 0%) 0 ( 0%) fold mem offsets : 0.04 ( 0%) 404k ( 0%) rest of compilation : 0.66 ( 2%) 4816k ( 0%) remove unused locals : 0.27 ( 1%) 12k ( 0%) address taken : 0.13 ( 0%) 448 ( 0%) rebuild frequencies : 0.04 ( 0%) 470k ( 0%) TOTAL : 42.70 947M real 0m42.741s user 0m42.099s sys 0m0.496s 3.03% cc1plus cc1plus [.] bitmap_ior_into(bitmap_head*, bitmap_head const*) | |--2.40%--bitmap_ior_into(bitmap_head*, bitmap_head const*) | | | |--1.08%--0x1be573b | | | | | --0.62%--0x1be439d | | df_worklist_dataflow(dataflow*, bitmap_head*, int*, int) | | | --0.51%--0x1c9eb64 | df_worklist_dataflow(dataflow*, bitmap_head*, int*, int) | --0.61%--gori_map::calculate_gori(basic_block_def*) ranger_cache::ranger_cache(int, bool) | --0.58%--gimple_ranger::gimple_ranger(bool) 2.99% cc1plus cc1plus [.] 0x00000000015a7203 | ---0x19a71d0 | |--1.64%--bitmap_ior_into(bitmap_head*, bitmap_head const*) | | | --1.04%--0x1be573b | --0.79%--bitmap_ior_and_compl(bitmap_head*, bitmap_head const*, bitmap_head const*, bitmap_head const*) | --0.74%--0x1be4246 df_worklist_dataflow(dataflow*, bitmap_head*, int*, int) (the backtrace seems to be corrupted on that one, i truncated it. callers may not be correct): 2.72% cc1plus cc1plus [.] bitmap_set_bit(bitmap_head*, int) | --2.64%--bitmap_set_bit(bitmap_head*, int) | --0.56%--0x1bffbda rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int) execute_ranger_vrp(function*, bool) execute_one_pass(opt_pass*) execute_pass_list(function*, opt_pass*) ... 2.15% cc1plus cc1plus [.] bitmap_and_into(bitmap_head*, bitmap_head const*) | --2.14%--bitmap_and_into(bitmap_head*, bitmap_head const*) | --1.29%--0x1c9e90f df_worklist_dataflow(dataflow*, bitmap_head*, int*, int) 1.84% cc1plus cc1plus [.] bitmap_bit_p(bitmap_head const*, int) | --1.78%--bitmap_bit_p(bitmap_head const*, int) | --0.86%--ranger_cache::block_range(vrange&, basic_block_def*, tree_node*, bool) 1.50% cc1plus cc1plus [.] get_ref_base_and_extent(tree_node*, poly_int<1u, long>*, poly_int<1u, long>*, poly_int<1u, long>*, bool*) | --1.49%--get_ref_base_and_extent(tree_node*, poly_int<1u, long>*, poly_int<1u, long>*, poly_int<1u, long>*, bool*) | --0.86%--0x1b15873 stmt_may_clobber_ref_p_1(gimple*, ao_ref*, bool) 1.20% cc1plus cc1plus [.] pre_and_rev_post_order_compute_fn(function*, int*, int*, bool) | --0.95%--pre_and_rev_post_order_compute_fn(function*, int*, int*, bool)