https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #28 from Jan Hubicka <hubicka at gcc dot gnu.org> --- Bit unrelated but shows that threader seems bit expensive on other builds too. Getting stats from cc1plus LTO-link with -flto-partition=one it seems that backwards threader and dom are two slowest tree passes. We get - 1% of build time for CCP, forward propagate, slp vectrization - 2% of build time for cfgcleanup, VRP, PTA, PRE, FRE - 3% of build time for dominator optimization - 4% of build time for backwards jump threading For RTL we get - 1% of buid time for fwprop, dse1, dse2, loop init, CPROP, CSE2, LRA live ranges - 2% of build time for CSE, PRE, combiner, LRA non-specific, reload CSE, - 3% for combiner - 4% for IRA and scheduler Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2301k ( 0%) phase opt and generate :1312.92 ( 98%) 70.40 ( 97%)1386.30 ( 98%) 26038M ( 97%) phase last asm : 27.07 ( 2%) 1.63 ( 2%) 28.77 ( 2%) 376M ( 1%) phase stream in : 0.96 ( 0%) 0.32 ( 0%) 1.29 ( 0%) 464M ( 2%) phase finalize : 3.64 ( 0%) 0.47 ( 1%) 4.12 ( 0%) 0 ( 0%) garbage collection : 27.45 ( 2%) 0.04 ( 0%) 27.54 ( 2%) 0 ( 0%) dump files : 3.53 ( 0%) 0.35 ( 0%) 4.37 ( 0%) 0 ( 0%) callgraph functions expansion :1311.82 ( 98%) 70.34 ( 97%)1385.15 ( 98%) 26022M ( 97%) callgraph ipa passes : 0.18 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 0 ( 0%) ipa dead code removal : 0.35 ( 0%) 0.01 ( 0%) 0.37 ( 0%) 0 ( 0%) ipa virtual call target : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 272 ( 0%) ipa cp : 0.12 ( 0%) 0.01 ( 0%) 0.16 ( 0%) 49M ( 0%) ipa inlining heuristics : 9.04 ( 1%) 0.98 ( 1%) 9.75 ( 1%) 402M ( 1%) lto stream decompression : 1.66 ( 0%) 0.17 ( 0%) 1.60 ( 0%) 0 ( 0%) ipa lto gimple in : 36.64 ( 3%) 3.55 ( 5%) 40.05 ( 3%) 3138M ( 12%) ipa lto decl in : 0.39 ( 0%) 0.21 ( 0%) 0.61 ( 0%) 137M ( 1%) ipa lto constructors in : 0.45 ( 0%) 0.05 ( 0%) 0.45 ( 0%) 60M ( 0%) ipa lto cgraph I/O : 0.36 ( 0%) 0.09 ( 0%) 0.44 ( 0%) 274M ( 1%) ipa reference : 0.00 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 0 ( 0%) ipa pure const : 0.51 ( 0%) 0.06 ( 0%) 0.65 ( 0%) 342k ( 0%) ipa modref : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 4272k ( 0%) cfg construction : 0.84 ( 0%) 0.03 ( 0%) 0.87 ( 0%) 86M ( 0%) cfg cleanup : 26.67 ( 2%) 0.53 ( 1%) 28.48 ( 2%) 199M ( 1%) trivially dead code : 8.90 ( 1%) 0.36 ( 0%) 9.26 ( 1%) 166k ( 0%) df scan insns : 5.46 ( 0%) 0.23 ( 0%) 5.41 ( 0%) 1498k ( 0%) df reaching defs : 13.04 ( 1%) 0.17 ( 0%) 12.73 ( 1%) 0 ( 0%) df live regs : 48.91 ( 4%) 0.75 ( 1%) 49.22 ( 3%) 14M ( 0%) df live&initialized regs : 20.46 ( 2%) 0.33 ( 0%) 20.80 ( 1%) 0 ( 0%) df must-initialized regs : 1.05 ( 0%) 0.01 ( 0%) 1.00 ( 0%) 0 ( 0%) df use-def / def-use chains : 5.91 ( 0%) 0.08 ( 0%) 6.25 ( 0%) 0 ( 0%) df live reg subwords : 0.00 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 ( 0%) df reg dead/unused notes : 18.21 ( 1%) 0.24 ( 0%) 18.56 ( 1%) 223M ( 1%) register information : 4.60 ( 0%) 0.08 ( 0%) 4.21 ( 0%) 0 ( 0%) alias analysis : 19.76 ( 1%) 0.28 ( 0%) 19.63 ( 1%) 478M ( 2%) alias stmt walking : 29.06 ( 2%) 2.30 ( 3%) 30.32 ( 2%) 65M ( 0%) register scan : 2.12 ( 0%) 0.06 ( 0%) 2.50 ( 0%) 20M ( 0%) rebuild jump labels : 2.65 ( 0%) 0.05 ( 0%) 2.86 ( 0%) 576 ( 0%) integration : 35.83 ( 3%) 6.81 ( 9%) 41.90 ( 3%) 2650M ( 10%) tree CFG cleanup : 21.87 ( 2%) 2.23 ( 3%) 24.72 ( 2%) 35M ( 0%) tree tail merge : 3.02 ( 0%) 0.18 ( 0%) 3.04 ( 0%) 132M ( 0%) tree VRP : 29.83 ( 2%) 1.92 ( 3%) 32.77 ( 2%) 354M ( 1%) tree copy propagation : 5.06 ( 0%) 0.36 ( 0%) 4.95 ( 0%) 6205k ( 0%) tree PTA : 22.85 ( 2%) 1.54 ( 2%) 24.96 ( 2%) 107M ( 0%) tree SSA incremental : 35.53 ( 3%) 1.74 ( 2%) 36.94 ( 3%) 381M ( 1%) tree operand scan : 44.83 ( 3%) 6.17 ( 8%) 50.12 ( 4%) 1028M ( 4%) dominator optimization : 43.80 ( 3%) 3.02 ( 4%) 47.50 ( 3%) 566M ( 2%) backwards jump threading : 49.72 ( 4%) 2.27 ( 3%) 53.04 ( 4%) 412M ( 2%) tree SRA : 0.61 ( 0%) 0.07 ( 0%) 0.64 ( 0%) 14M ( 0%) isolate eroneous paths : 0.70 ( 0%) 0.04 ( 0%) 0.68 ( 0%) 7987k ( 0%) tree CCP : 18.10 ( 1%) 1.54 ( 2%) 18.86 ( 1%) 62M ( 0%) tree split crit edges : 0.18 ( 0%) 0.01 ( 0%) 0.20 ( 0%) 37M ( 0%) tree reassociation : 2.34 ( 0%) 0.21 ( 0%) 2.60 ( 0%) 10M ( 0%) tree PRE : 24.31 ( 2%) 1.57 ( 2%) 26.42 ( 2%) 394M ( 1%) tree FRE : 24.13 ( 2%) 2.07 ( 3%) 26.24 ( 2%) 119M ( 0%) tree code sinking : 2.74 ( 0%) 0.24 ( 0%) 3.18 ( 0%) 315M ( 1%) tree linearize phis : 1.43 ( 0%) 0.08 ( 0%) 1.69 ( 0%) 42M ( 0%) tree backward propagate : 0.45 ( 0%) 0.07 ( 0%) 0.53 ( 0%) 64 ( 0%) tree forward propagate : 9.53 ( 1%) 0.97 ( 1%) 10.80 ( 1%) 65M ( 0%) tree phiprop : 0.16 ( 0%) 0.01 ( 0%) 0.16 ( 0%) 299k ( 0%) tree conservative DCE : 6.41 ( 0%) 0.68 ( 1%) 7.26 ( 1%) 9555k ( 0%) tree buildin call DCE : 0.05 ( 0%) 0.02 ( 0%) 0.13 ( 0%) 0 ( 0%) tree DSE : 5.29 ( 0%) 0.32 ( 0%) 5.65 ( 0%) 35M ( 0%) PHI merge : 1.85 ( 0%) 0.02 ( 0%) 1.94 ( 0%) 25M ( 0%) tree loop optimization : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) loopless fn : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) tree loop invariant motion : 2.18 ( 0%) 0.13 ( 0%) 2.35 ( 0%) 3995k ( 0%) tree canonical iv : 0.95 ( 0%) 0.05 ( 0%) 0.90 ( 0%) 21M ( 0%) scev constant prop : 0.29 ( 0%) 0.02 ( 0%) 0.23 ( 0%) 5916k ( 0%) complete unrolling : 3.86 ( 0%) 0.31 ( 0%) 3.79 ( 0%) 100M ( 0%) tree vectorization : 0.45 ( 0%) 0.01 ( 0%) 0.47 ( 0%) 16M ( 0%) tree slp vectorization : 10.89 ( 1%) 5.59 ( 8%) 16.42 ( 1%) 767M ( 3%) tree loop distribution : 0.68 ( 0%) 0.07 ( 0%) 0.87 ( 0%) 10M ( 0%) tree iv optimization : 5.06 ( 0%) 0.23 ( 0%) 5.58 ( 0%) 144M ( 1%) predictive commoning : 0.86 ( 0%) 0.09 ( 0%) 0.86 ( 0%) 22M ( 0%) tree copy headers : 1.35 ( 0%) 0.11 ( 0%) 1.58 ( 0%) 46M ( 0%) tree SSA uncprop : 1.05 ( 0%) 0.14 ( 0%) 0.91 ( 0%) 79k ( 0%) tree NRV optimization : 0.06 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 543k ( 0%) tree switch lowering : 0.79 ( 0%) 0.03 ( 0%) 0.75 ( 0%) 37M ( 0%) gimple CSE sin/cos : 0.19 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 ( 0%) gimple widening/fma detection : 0.62 ( 0%) 0.04 ( 0%) 0.69 ( 0%) 828k ( 0%) tree strlen optimization : 1.52 ( 0%) 0.13 ( 0%) 1.49 ( 0%) 78M ( 0%) tree modref : 1.06 ( 0%) 0.09 ( 0%) 1.12 ( 0%) 14M ( 0%) dominance frontiers : 1.22 ( 0%) 0.09 ( 0%) 1.61 ( 0%) 0 ( 0%) dominance computation : 14.46 ( 1%) 0.98 ( 1%) 15.53 ( 1%) 0 ( 0%) control dependences : 0.26 ( 0%) 0.01 ( 0%) 0.21 ( 0%) 0 ( 0%) out of ssa : 3.31 ( 0%) 0.26 ( 0%) 3.53 ( 0%) 6688k ( 0%) expand vars : 10.59 ( 1%) 0.19 ( 0%) 10.92 ( 1%) 132M ( 0%) expand : 23.83 ( 2%) 1.16 ( 2%) 24.71 ( 2%) 2393M ( 9%) post expand cleanups : 1.95 ( 0%) 0.10 ( 0%) 2.23 ( 0%) 83M ( 0%) varconst : 0.00 ( 0%) 0.02 ( 0%) 0.03 ( 0%) 0 ( 0%) lower subreg : 0.33 ( 0%) 0.01 ( 0%) 0.33 ( 0%) 570k ( 0%) jump : 0.20 ( 0%) 0.02 ( 0%) 0.13 ( 0%) 0 ( 0%) forward prop : 13.81 ( 1%) 0.44 ( 1%) 14.32 ( 1%) 20M ( 0%) CSE : 24.85 ( 2%) 0.58 ( 1%) 26.76 ( 2%) 75M ( 0%) dead code elimination : 3.46 ( 0%) 0.10 ( 0%) 3.59 ( 0%) 16k ( 0%) dead store elim1 : 7.91 ( 1%) 0.20 ( 0%) 7.89 ( 1%) 127M ( 0%) dead store elim2 : 7.83 ( 1%) 0.19 ( 0%) 7.90 ( 1%) 163M ( 1%) loop analysis : 0.12 ( 0%) 0.01 ( 0%) 0.11 ( 0%) 0 ( 0%) loop init : 8.77 ( 1%) 0.77 ( 1%) 9.89 ( 1%) 498M ( 2%) loop invariant motion : 1.31 ( 0%) 0.02 ( 0%) 1.25 ( 0%) 3328k ( 0%) loop fini : 0.85 ( 0%) 0.06 ( 0%) 1.07 ( 0%) 228k ( 0%) CPROP : 18.61 ( 1%) 0.56 ( 1%) 19.28 ( 1%) 434M ( 2%) PRE : 25.73 ( 2%) 0.42 ( 1%) 26.09 ( 2%) 14M ( 0%) CSE 2 : 14.80 ( 1%) 0.36 ( 0%) 14.99 ( 1%) 40M ( 0%) branch prediction : 0.13 ( 0%) 0.02 ( 0%) 0.18 ( 0%) 1024k ( 0%) combiner : 38.90 ( 3%) 0.75 ( 1%) 39.58 ( 3%) 729M ( 3%) if-conversion : 4.21 ( 0%) 0.11 ( 0%) 4.23 ( 0%) 83M ( 0%) mode switching : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) integrated RA : 59.64 ( 4%) 1.08 ( 1%) 59.95 ( 4%) 1714M ( 6%) LRA non-specific : 21.09 ( 2%) 0.27 ( 0%) 21.33 ( 2%) 123M ( 0%) LRA virtuals elimination : 2.74 ( 0%) 0.06 ( 0%) 2.68 ( 0%) 64M ( 0%) LRA reload inheritance : 4.85 ( 0%) 0.03 ( 0%) 4.47 ( 0%) 70M ( 0%) LRA create live ranges : 12.15 ( 1%) 0.09 ( 0%) 12.34 ( 1%) 15M ( 0%) LRA hard reg assignment : 2.17 ( 0%) 0.03 ( 0%) 2.38 ( 0%) 0 ( 0%) LRA coalesce pseudo regs : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) LRA rematerialization : 3.05 ( 0%) 0.06 ( 0%) 3.17 ( 0%) 9312 ( 0%) reload : 0.22 ( 0%) 0.01 ( 0%) 0.32 ( 0%) 1134k ( 0%) reload CSE regs : 21.03 ( 2%) 0.38 ( 1%) 22.08 ( 2%) 229M ( 1%) ree : 2.00 ( 0%) 0.04 ( 0%) 1.93 ( 0%) 8158k ( 0%) thread pro- & epilogue : 4.69 ( 0%) 0.11 ( 0%) 4.79 ( 0%) 100M ( 0%) varconst : 0.00 ( 0%) 0.02 ( 0%) 0.03 ( 0%) 0 ( 0%) lower subreg : 0.33 ( 0%) 0.01 ( 0%) 0.33 ( 0%) 570k ( 0%) jump : 0.20 ( 0%) 0.02 ( 0%) 0.13 ( 0%) 0 ( 0%) forward prop : 13.81 ( 1%) 0.44 ( 1%) 14.32 ( 1%) 20M ( 0%) CSE : 24.85 ( 2%) 0.58 ( 1%) 26.76 ( 2%) 75M ( 0%) dead code elimination : 3.46 ( 0%) 0.10 ( 0%) 3.59 ( 0%) 16k ( 0%) dead store elim1 : 7.91 ( 1%) 0.20 ( 0%) 7.89 ( 1%) 127M ( 0%) dead store elim2 : 7.83 ( 1%) 0.19 ( 0%) 7.90 ( 1%) 163M ( 1%) loop analysis : 0.12 ( 0%) 0.01 ( 0%) 0.11 ( 0%) 0 ( 0%) loop init : 8.77 ( 1%) 0.77 ( 1%) 9.89 ( 1%) 498M ( 2%) loop invariant motion : 1.31 ( 0%) 0.02 ( 0%) 1.25 ( 0%) 3328k ( 0%) loop fini : 0.85 ( 0%) 0.06 ( 0%) 1.07 ( 0%) 228k ( 0%) CPROP : 18.61 ( 1%) 0.56 ( 1%) 19.28 ( 1%) 434M ( 2%) PRE : 25.73 ( 2%) 0.42 ( 1%) 26.09 ( 2%) 14M ( 0%) CSE 2 : 14.80 ( 1%) 0.36 ( 0%) 14.99 ( 1%) 40M ( 0%) branch prediction : 0.13 ( 0%) 0.02 ( 0%) 0.18 ( 0%) 1024k ( 0%) combiner : 38.90 ( 3%) 0.75 ( 1%) 39.58 ( 3%) 729M ( 3%) if-conversion : 4.21 ( 0%) 0.11 ( 0%) 4.23 ( 0%) 83M ( 0%) mode switching : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) integrated RA : 59.64 ( 4%) 1.08 ( 1%) 59.95 ( 4%) 1714M ( 6%) LRA non-specific : 21.09 ( 2%) 0.27 ( 0%) 21.33 ( 2%) 123M ( 0%) LRA virtuals elimination : 2.74 ( 0%) 0.06 ( 0%) 2.68 ( 0%) 64M ( 0%) LRA reload inheritance : 4.85 ( 0%) 0.03 ( 0%) 4.47 ( 0%) 70M ( 0%) LRA create live ranges : 12.15 ( 1%) 0.09 ( 0%) 12.34 ( 1%) 15M ( 0%) LRA hard reg assignment : 2.17 ( 0%) 0.03 ( 0%) 2.38 ( 0%) 0 ( 0%) LRA coalesce pseudo regs : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 ( 0%) LRA rematerialization : 3.05 ( 0%) 0.06 ( 0%) 3.17 ( 0%) 9312 ( 0%) reload : 0.22 ( 0%) 0.01 ( 0%) 0.32 ( 0%) 1134k ( 0%) reload CSE regs : 21.03 ( 2%) 0.38 ( 1%) 22.08 ( 2%) 229M ( 1%) ree : 2.00 ( 0%) 0.04 ( 0%) 1.93 ( 0%) 8158k ( 0%) thread pro- & epilogue : 4.69 ( 0%) 0.11 ( 0%) 4.79 ( 0%) 100M ( 0%) if-conversion 2 : 0.94 ( 0%) 0.02 ( 0%) 1.11 ( 0%) 2090k ( 0%) combine stack adjustments : 1.22 ( 0%) 0.02 ( 0%) 0.89 ( 0%) 35k ( 0%) peephole 2 : 4.54 ( 0%) 0.07 ( 0%) 5.08 ( 0%) 45M ( 0%) hard reg cprop : 6.08 ( 0%) 0.15 ( 0%) 6.07 ( 0%) 7038k ( 0%) scheduling 2 : 54.76 ( 4%) 0.99 ( 1%) 57.50 ( 4%) 100M ( 0%) machine dep reorg : 3.49 ( 0%) 0.07 ( 0%) 3.93 ( 0%) 1502k ( 0%) reorder blocks : 6.27 ( 0%) 0.09 ( 0%) 5.70 ( 0%) 127M ( 0%) shorten branches : 4.66 ( 0%) 0.06 ( 0%) 4.75 ( 0%) 41k ( 0%) reg stack : 0.05 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 ( 0%) final : 22.18 ( 2%) 1.11 ( 2%) 23.16 ( 2%) 1246M ( 5%) variable output : 0.74 ( 0%) 0.02 ( 0%) 0.76 ( 0%) 14M ( 0%) symout : 35.49 ( 3%) 1.93 ( 3%) 37.27 ( 3%) 2915M ( 11%) variable tracking : 23.14 ( 2%) 0.44 ( 1%) 23.53 ( 2%) 801M ( 3%) var-tracking dataflow : 34.59 ( 3%) 0.18 ( 0%) 34.46 ( 2%) 21M ( 0%) var-tracking emit : 25.94 ( 2%) 0.22 ( 0%) 26.23 ( 2%) 671M ( 2%) tree if-combine : 0.64 ( 0%) 0.09 ( 0%) 0.65 ( 0%) 12M ( 0%) uninit var analysis : 0.19 ( 0%) 0.01 ( 0%) 0.10 ( 0%) 42k ( 0%) straight-line strength reduction : 1.18 ( 0%) 0.05 ( 0%) 1.08 ( 0%) 8911k ( 0%) store merging : 1.09 ( 0%) 0.15 ( 0%) 1.09 ( 0%) 16M ( 0%) initialize rtl : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 34k ( 0%) address lowering : 0.09 ( 0%) 0.01 ( 0%) 0.22 ( 0%) 2001k ( 0%) tree loop if-conversion : 0.25 ( 0%) 0.03 ( 0%) 0.35 ( 0%) 8491k ( 0%) unaccounted optimizations : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 ( 0%) rest of compilation : 24.17 ( 2%) 1.66 ( 2%) 26.17 ( 2%) 142M ( 1%) unaccounted late compilation : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 ( 0%) remove unused locals : 4.95 ( 0%) 0.68 ( 1%) 5.75 ( 0%) 420k ( 0%) address taken : 3.96 ( 0%) 1.38 ( 2%) 5.19 ( 0%) 0 ( 0%) rebuild frequencies : 0.76 ( 0%) 0.13 ( 0%) 1.18 ( 0%) 2998k ( 0%) repair loop structures : 0.36 ( 0%) 0.02 ( 0%) 0.47 ( 0%) 82k ( 0%) TOTAL :1344.59 72.82 1420.49 26881M