On Tue, Jan 15, 2019 at 10:45 PM Giuliano Belinassi <giuliano.belina...@usp.br> wrote: > > Hi > > I've managed to compile gimple-match.c with -ftime-report, and "phase opt and > generate" seems to be what takes most of the compilation time. This is > captured > by the "TV_PHASE_OPT_GEN" timevar, and all its occurrences seem to be in > toplev.c and lto.c.
TV_PHASE_OPT_GEN covers nearly everything besides parsing. Thus all stuff below "phase *" is covered by one of the phases. It would probably be nice to split up TV_PHASE_OPT_GEN into GIMPLE, IPA and RTL optimization phases. > Any ideas of which part such that this variable captures is > the most costly? Also, is that percentage in "GGC" column the amount of time > inside the Garbage Collector? The percentage for the GGC column is the percentage of total GGC memory, not time. See timevar.c:print_row The most costly part of opt-and-generate is the various verifiers. See the note printed at the bottom: > Extra diagnostic checks enabled; compiler may run slowly. > Configure with --enable-checking=release to disable checks. you can get a clearer picture when you configure GCC with --enable-checking=release. For a quick start passing -fno-checking will disable the most costly bits already. Richard. > > Time variable usr sys > wall GGC > phase setup : 0.01 ( 0%) 0.01 ( 0%) 0.02 ( > 0%) 1473 kB ( 0%) > phase parsing : 3.74 ( 4%) 1.43 ( 30%) 5.17 ( > 5%) 294287 kB ( 16%) > phase lang. deferred : 0.08 ( 0%) 0.03 ( 1%) 0.11 ( > 0%) 7582 kB ( 0%) > phase opt and generate : 94.10 ( 95%) 3.26 ( 67%) 97.46 ( > 93%) 1543477 kB ( 82%) > phase last asm : 0.89 ( 1%) 0.09 ( 2%) 0.98 ( > 1%) 39802 kB ( 2%) > phase finalize : 0.00 ( 0%) 0.01 ( 0%) 0.50 ( > 0%) 0 kB ( 0%) > |name lookup : 0.42 ( 0%) 0.12 ( 2%) 0.46 ( > 0%) 6162 kB ( 0%) > |overload resolution : 0.37 ( 0%) 0.13 ( 3%) 0.42 ( > 0%) 18172 kB ( 1%) > garbage collection : 2.99 ( 3%) 0.03 ( 1%) 3.02 ( > 3%) 0 kB ( 0%) > dump files : 0.11 ( 0%) 0.01 ( 0%) 0.16 ( > 0%) 0 kB ( 0%) > callgraph construction : 0.35 ( 0%) 0.01 ( 0%) 0.24 ( > 0%) 61143 kB ( 3%) > callgraph optimization : 0.21 ( 0%) 0.01 ( 0%) 0.17 ( > 0%) 175 kB ( 0%) > ipa function summary : 0.12 ( 0%) 0.00 ( 0%) 0.14 ( > 0%) 2216 kB ( 0%) > ipa dead code removal : 0.04 ( 0%) 0.01 ( 0%) 0.00 ( > 0%) 0 kB ( 0%) > ipa devirtualization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > ipa cp : 0.33 ( 0%) 0.01 ( 0%) 0.39 ( > 0%) 9073 kB ( 0%) > ipa inlining heuristics : 0.48 ( 0%) 0.00 ( 0%) 0.48 ( > 0%) 6175 kB ( 0%) > ipa function splitting : 0.10 ( 0%) 0.01 ( 0%) 0.07 ( > 0%) 9111 kB ( 0%) > ipa comdats : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( > 0%) 0 kB ( 0%) > ipa various optimizations : 0.03 ( 0%) 0.03 ( 1%) 0.01 ( > 0%) 480 kB ( 0%) > ipa reference : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 0 kB ( 0%) > ipa profile : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > ipa pure const : 0.13 ( 0%) 0.00 ( 0%) 0.12 ( > 0%) 8 kB ( 0%) > ipa icf : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( > 0%) 6 kB ( 0%) > ipa SRA : 1.26 ( 1%) 0.28 ( 6%) 1.78 ( > 2%) 165814 kB ( 9%) > ipa free lang data : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( > 0%) 0 kB ( 0%) > ipa free inline summary : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( > 0%) 0 kB ( 0%) > cfg construction : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( > 0%) 7926 kB ( 0%) > cfg cleanup : 1.84 ( 2%) 0.00 ( 0%) 1.73 ( > 2%) 13673 kB ( 1%) > CFG verifier : 6.05 ( 6%) 0.12 ( 2%) 6.80 ( > 7%) 0 kB ( 0%) > trivially dead code : 0.32 ( 0%) 0.01 ( 0%) 0.38 ( > 0%) 0 kB ( 0%) > df scan insns : 0.23 ( 0%) 0.00 ( 0%) 0.30 ( > 0%) 28 kB ( 0%) > df multiple defs : 0.13 ( 0%) 0.00 ( 0%) 0.20 ( > 0%) 0 kB ( 0%) > df reaching defs : 0.52 ( 1%) 0.00 ( 0%) 0.55 ( > 1%) 0 kB ( 0%) > df live regs : 2.70 ( 3%) 0.02 ( 0%) 3.08 ( > 3%) 425 kB ( 0%) > df live&initialized regs : 1.28 ( 1%) 0.00 ( 0%) 1.13 ( > 1%) 0 kB ( 0%) > df must-initialized regs : 0.14 ( 0%) 0.00 ( 0%) 0.16 ( > 0%) 0 kB ( 0%) > df use-def / def-use chains : 0.32 ( 0%) 0.00 ( 0%) 0.26 ( > 0%) 0 kB ( 0%) > df reg dead/unused notes : 0.96 ( 1%) 0.01 ( 0%) 0.89 ( > 1%) 11726 kB ( 1%) > register information : 0.29 ( 0%) 0.00 ( 0%) 0.21 ( > 0%) 0 kB ( 0%) > alias analysis : 0.54 ( 1%) 0.00 ( 0%) 0.53 ( > 1%) 17487 kB ( 1%) > alias stmt walking : 1.10 ( 1%) 0.08 ( 2%) 1.22 ( > 1%) 118 kB ( 0%) > register scan : 0.08 ( 0%) 0.01 ( 0%) 0.08 ( > 0%) 118 kB ( 0%) > rebuild jump labels : 0.12 ( 0%) 0.01 ( 0%) 0.11 ( > 0%) 0 kB ( 0%) > preprocessing : 0.29 ( 0%) 0.43 ( 9%) 0.65 ( > 1%) 37409 kB ( 2%) > parser (global) : 0.39 ( 0%) 0.39 ( 8%) 0.94 ( > 1%) 92661 kB ( 5%) > parser struct body : 0.07 ( 0%) 0.00 ( 0%) 0.08 ( > 0%) 6159 kB ( 0%) > parser enumerator list : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 3342 kB ( 0%) > parser function body : 2.37 ( 2%) 0.43 ( 9%) 2.82 ( > 3%) 119124 kB ( 6%) > parser inl. func. body : 0.18 ( 0%) 0.05 ( 1%) 0.16 ( > 0%) 10354 kB ( 1%) > parser inl. meth. body : 0.04 ( 0%) 0.01 ( 0%) 0.03 ( > 0%) 2986 kB ( 0%) > template instantiation : 0.17 ( 0%) 0.08 ( 2%) 0.26 ( > 0%) 15801 kB ( 1%) > constant expression evaluation : 0.06 ( 0%) 0.05 ( 1%) 0.07 ( > 0%) 516 kB ( 0%) > early inlining heuristics : 0.13 ( 0%) 0.00 ( 0%) 0.08 ( > 0%) 19547 kB ( 1%) > inline parameters : 0.14 ( 0%) 0.01 ( 0%) 0.22 ( > 0%) 3372 kB ( 0%) > integration : 1.00 ( 1%) 0.23 ( 5%) 1.22 ( > 1%) 132386 kB ( 7%) > tree gimplify : 0.36 ( 0%) 0.02 ( 0%) 0.31 ( > 0%) 63162 kB ( 3%) > tree eh : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( > 0%) 4173 kB ( 0%) > tree CFG construction : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( > 0%) 20805 kB ( 1%) > tree CFG cleanup : 1.40 ( 1%) 0.14 ( 3%) 1.57 ( > 2%) 3995 kB ( 0%) > tree tail merge : 0.17 ( 0%) 0.01 ( 0%) 0.16 ( > 0%) 7251 kB ( 0%) > tree VRP : 1.94 ( 2%) 0.08 ( 2%) 1.83 ( > 2%) 40527 kB ( 2%) > tree Early VRP : 0.27 ( 0%) 0.03 ( 1%) 0.30 ( > 0%) 3298 kB ( 0%) > tree copy propagation : 0.14 ( 0%) 0.00 ( 0%) 0.08 ( > 0%) 427 kB ( 0%) > tree PTA : 0.61 ( 1%) 0.03 ( 1%) 0.53 ( > 1%) 3861 kB ( 0%) > tree PHI insertion : 0.01 ( 0%) 0.02 ( 0%) 0.03 ( > 0%) 8529 kB ( 0%) > tree SSA rewrite : 0.23 ( 0%) 0.03 ( 1%) 0.43 ( > 0%) 24334 kB ( 1%) > tree SSA other : 0.10 ( 0%) 0.01 ( 0%) 0.10 ( > 0%) 538 kB ( 0%) > tree SSA incremental : 0.79 ( 1%) 0.07 ( 1%) 0.88 ( > 1%) 11828 kB ( 1%) > tree operand scan : 1.33 ( 1%) 0.30 ( 6%) 1.51 ( > 1%) 56249 kB ( 3%) > dominator optimization : 1.92 ( 2%) 0.07 ( 1%) 1.90 ( > 2%) 31786 kB ( 2%) > backwards jump threading : 0.20 ( 0%) 0.02 ( 0%) 0.16 ( > 0%) 8676 kB ( 0%) > tree SRA : 0.17 ( 0%) 0.01 ( 0%) 0.09 ( > 0%) 6050 kB ( 0%) > isolate eroneous paths : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( > 0%) 1319 kB ( 0%) > tree CCP : 0.67 ( 1%) 0.08 ( 2%) 0.62 ( > 1%) 4190 kB ( 0%) > tree PHI const/copy prop : 0.10 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 132 kB ( 0%) > tree split crit edges : 0.12 ( 0%) 0.00 ( 0%) 0.15 ( > 0%) 10236 kB ( 1%) > tree reassociation : 0.14 ( 0%) 0.00 ( 0%) 0.08 ( > 0%) 168 kB ( 0%) > tree PRE : 0.74 ( 1%) 0.04 ( 1%) 0.76 ( > 1%) 16728 kB ( 1%) > tree FRE : 0.69 ( 1%) 0.04 ( 1%) 0.60 ( > 1%) 5370 kB ( 0%) > tree code sinking : 0.06 ( 0%) 0.01 ( 0%) 0.06 ( > 0%) 9670 kB ( 1%) > tree linearize phis : 0.10 ( 0%) 0.00 ( 0%) 0.09 ( > 0%) 699 kB ( 0%) > tree backward propagate : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > tree forward propagate : 0.52 ( 1%) 0.04 ( 1%) 0.48 ( > 0%) 3055 kB ( 0%) > tree phiprop : 0.05 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > tree conservative DCE : 0.27 ( 0%) 0.03 ( 1%) 0.43 ( > 0%) 1557 kB ( 0%) > tree aggressive DCE : 0.21 ( 0%) 0.04 ( 1%) 0.23 ( > 0%) 2565 kB ( 0%) > tree buildin call DCE : 0.00 ( 0%) 0.00 ( 0%) 0.04 ( > 0%) 0 kB ( 0%) > tree DSE : 0.18 ( 0%) 0.01 ( 0%) 0.18 ( > 0%) 274 kB ( 0%) > PHI merge : 0.07 ( 0%) 0.00 ( 0%) 0.06 ( > 0%) 3170 kB ( 0%) > tree loop optimization : 0.00 ( 0%) 0.00 ( 0%) 0.04 ( > 0%) 0 kB ( 0%) > loopless fn : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > tree loop invariant motion : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 0 kB ( 0%) > tree canonical iv : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( > 0%) 58 kB ( 0%) > complete unrolling : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 361 kB ( 0%) > tree iv optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 128 kB ( 0%) > tree copy headers : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 414 kB ( 0%) > tree SSA uncprop : 0.06 ( 0%) 0.00 ( 0%) 0.09 ( > 0%) 0 kB ( 0%) > tree NRV optimization : 0.01 ( 0%) 0.00 ( 0%) 0.05 ( > 0%) 14 kB ( 0%) > tree SSA verifier : 8.44 ( 9%) 0.26 ( 5%) 8.77 ( > 8%) 0 kB ( 0%) > tree STMT verifier : 12.57 ( 13%) 0.35 ( 7%) 13.03 ( > 12%) 0 kB ( 0%) > tree switch conversion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 5 kB ( 0%) > tree switch lowering : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 1194 kB ( 0%) > gimple CSE sin/cos : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > gimple widening/fma detection : 0.06 ( 0%) 0.00 ( 0%) 0.03 ( > 0%) 2 kB ( 0%) > tree strlen optimization : 0.03 ( 0%) 0.00 ( 0%) 0.05 ( > 0%) 0 kB ( 0%) > callgraph verifier : 0.93 ( 1%) 0.07 ( 1%) 0.99 ( > 1%) 0 kB ( 0%) > dominance frontiers : 0.14 ( 0%) 0.00 ( 0%) 0.07 ( > 0%) 0 kB ( 0%) > dominance computation : 1.98 ( 2%) 0.05 ( 1%) 2.17 ( > 2%) 0 kB ( 0%) > control dependences : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > out of ssa : 0.11 ( 0%) 0.00 ( 0%) 0.11 ( > 0%) 253 kB ( 0%) > expand vars : 0.12 ( 0%) 0.00 ( 0%) 0.12 ( > 0%) 5803 kB ( 0%) > expand : 0.68 ( 1%) 0.02 ( 0%) 0.75 ( > 1%) 129150 kB ( 7%) > post expand cleanups : 0.09 ( 0%) 0.00 ( 0%) 0.03 ( > 0%) 1400 kB ( 0%) > varconst : 0.01 ( 0%) 0.01 ( 0%) 0.01 ( > 0%) 13 kB ( 0%) > lower subreg : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 63 kB ( 0%) > forward prop : 0.32 ( 0%) 0.01 ( 0%) 0.34 ( > 0%) 7384 kB ( 0%) > CSE : 1.03 ( 1%) 0.02 ( 0%) 0.95 ( > 1%) 4656 kB ( 0%) > dead code elimination : 0.23 ( 0%) 0.00 ( 0%) 0.22 ( > 0%) 0 kB ( 0%) > dead store elim1 : 0.40 ( 0%) 0.00 ( 0%) 0.34 ( > 0%) 5665 kB ( 0%) > dead store elim2 : 0.60 ( 1%) 0.00 ( 0%) 0.65 ( > 1%) 9079 kB ( 0%) > loop analysis : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 0 kB ( 0%) > loop init : 1.31 ( 1%) 0.05 ( 1%) 1.64 ( > 2%) 5802 kB ( 0%) > loop invariant motion : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 19 kB ( 0%) > loop fini : 0.02 ( 0%) 0.01 ( 0%) 0.04 ( > 0%) 0 kB ( 0%) > CPROP : 1.27 ( 1%) 0.01 ( 0%) 1.14 ( > 1%) 30881 kB ( 2%) > PRE : 0.61 ( 1%) 0.00 ( 0%) 0.59 ( > 1%) 1920 kB ( 0%) > CSE 2 : 0.57 ( 1%) 0.01 ( 0%) 0.58 ( > 1%) 2822 kB ( 0%) > branch prediction : 0.08 ( 0%) 0.01 ( 0%) 0.10 ( > 0%) 887 kB ( 0%) > combiner : 1.15 ( 1%) 0.00 ( 0%) 1.28 ( > 1%) 35520 kB ( 2%) > if-conversion : 0.24 ( 0%) 0.00 ( 0%) 0.22 ( > 0%) 5851 kB ( 0%) > integrated RA : 2.29 ( 2%) 0.03 ( 1%) 2.37 ( > 2%) 54041 kB ( 3%) > LRA non-specific : 0.97 ( 1%) 0.01 ( 0%) 1.04 ( > 1%) 5294 kB ( 0%) > LRA virtuals elimination : 0.44 ( 0%) 0.00 ( 0%) 0.39 ( > 0%) 6089 kB ( 0%) > LRA reload inheritance : 0.17 ( 0%) 0.00 ( 0%) 0.27 ( > 0%) 5783 kB ( 0%) > LRA create live ranges : 1.07 ( 1%) 0.00 ( 0%) 1.09 ( > 1%) 1004 kB ( 0%) > LRA hard reg assignment : 0.11 ( 0%) 0.00 ( 0%) 0.09 ( > 0%) 0 kB ( 0%) > LRA rematerialization : 0.20 ( 0%) 0.00 ( 0%) 0.20 ( > 0%) 0 kB ( 0%) > reload : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( > 0%) 0 kB ( 0%) > reload CSE regs : 0.90 ( 1%) 0.01 ( 0%) 0.80 ( > 1%) 13780 kB ( 1%) > ree : 0.13 ( 0%) 0.00 ( 0%) 0.10 ( > 0%) 589 kB ( 0%) > thread pro- & epilogue : 0.51 ( 1%) 0.01 ( 0%) 0.57 ( > 1%) 2328 kB ( 0%) > if-conversion 2 : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( > 0%) 319 kB ( 0%) > combine stack adjustments : 0.04 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 0 kB ( 0%) > peephole 2 : 0.12 ( 0%) 0.00 ( 0%) 0.18 ( > 0%) 1242 kB ( 0%) > hard reg cprop : 0.57 ( 1%) 0.00 ( 0%) 0.49 ( > 0%) 189 kB ( 0%) > scheduling 2 : 2.53 ( 3%) 0.03 ( 1%) 2.53 ( > 2%) 5740 kB ( 0%) > machine dep reorg : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( > 0%) 0 kB ( 0%) > reorder blocks : 0.74 ( 1%) 0.01 ( 0%) 0.69 ( > 1%) 6926 kB ( 0%) > shorten branches : 0.20 ( 0%) 0.00 ( 0%) 0.16 ( > 0%) 0 kB ( 0%) > final : 0.85 ( 1%) 0.01 ( 0%) 0.97 ( > 1%) 115151 kB ( 6%) > symout : 1.17 ( 1%) 0.11 ( 2%) 1.25 ( > 1%) 202121 kB ( 11%) > variable tracking : 0.77 ( 1%) 0.01 ( 0%) 0.81 ( > 1%) 45792 kB ( 2%) > var-tracking dataflow : 1.30 ( 1%) 0.01 ( 0%) 1.24 ( > 1%) 926 kB ( 0%) > var-tracking emit : 1.43 ( 1%) 0.01 ( 0%) 1.42 ( > 1%) 57281 kB ( 3%) > tree if-combine : 0.06 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 417 kB ( 0%) > uninit var analysis : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 0 kB ( 0%) > straight-line strength reduction : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( > 0%) 525 kB ( 0%) > store merging : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( > 0%) 492 kB ( 0%) > initialize rtl : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( > 0%) 12 kB ( 0%) > address lowering : 0.04 ( 0%) 0.00 ( 0%) 0.02 ( > 0%) 2 kB ( 0%) > early local passes : 0.02 ( 0%) 0.01 ( 0%) 0.00 ( > 0%) 0 kB ( 0%) > unaccounted optimizations : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( > 0%) 0 kB ( 0%) > rest of compilation : 1.29 ( 1%) 0.01 ( 0%) 1.11 ( > 1%) 5063 kB ( 0%) > remove unused locals : 0.25 ( 0%) 0.04 ( 1%) 0.25 ( > 0%) 37 kB ( 0%) > address taken : 0.11 ( 0%) 0.10 ( 2%) 0.25 ( > 0%) 0 kB ( 0%) > verify loop closed : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( > 0%) 0 kB ( 0%) > verify RTL sharing : 5.24 ( 5%) 0.05 ( 1%) 5.37 ( > 5%) 0 kB ( 0%) > rebuild frequencies : 0.04 ( 0%) 0.00 ( 0%) 0.06 ( > 0%) 621 kB ( 0%) > repair loop structures : 0.17 ( 0%) 0.00 ( 0%) 0.24 ( > 0%) 0 kB ( 0%) > TOTAL : 98.82 4.83 104.24 > 1886632 kB > Extra diagnostic checks enabled; compiler may run slowly. > Configure with --enable-checking=release to disable checks. > > real 1m54.934s > user 1m48.938s > sys 0m5.196s > > > Thank you > Giuliano. > > On 01/14, Richard Biener wrote: > > On Mon, Jan 14, 2019 at 12:41 PM Giuliano Belinassi > > <giuliano.belina...@usp.br> wrote: > > > > > > Hi, > > > > > > I am currently studying the GIMPLE IR documentation and thinking about a > > > way easily gather the timing information. I was thinking about about > > > adding this feature to gcc to show/dump the elapsed time on GIMPLE. Does > > > this makes sense? Is this already implemented somewhere? Where is a good > > > way to start it? > > > > There's -ftime-report which more-or-less tells you the time spent in the > > individual passes. I think there's no overall group to count GIMPLE > > optimizers vs. RTL optimizers though. > > > > > Richard Biener: I would like to know What is your nickname in IRC :) > > > > It's richi. > > > > Richard. > > > > > Thank you, > > > Giuliano. > > > > > > On 12/17, Richard Biener wrote: > > > > On Wed, Dec 12, 2018 at 4:46 PM Giuliano Augusto Faulin Belinassi > > > > <giuliano.belina...@usp.br> wrote: > > > > > > > > > > Hi, I have some news. :-) > > > > > > > > > > I replicated the Martin Liška experiment [1] on a 64-cores machine for > > > > > gcc [2] and Linux kernel [3] (Linux kernel was fully parallelized), > > > > > and I am excited to dive into this problem. As a result, I want to > > > > > propose GSoC project on this issue, starting with something like: > > > > > 1- Systematically create a benchmark for easily information > > > > > gathering. Martin Liška already made the first version of it, but I > > > > > need to improve it. > > > > > 2- Find and document the global states (Try to reduce the gcc's > > > > > global states as well). > > > > > 3- Define the parallelization strategy. > > > > > 4- First parallelization attempt. > > > > > > > > > > I also proposed this issue as a research project to my advisor and he > > > > > supported me on this idea. So I can work for at least one year on > > > > > this, and other things related to it. > > > > > > > > > > Would anyone be willing to mentor me on this? > > > > > > > > As the one who initially suggested the project I'm certainly willing > > > > to mentor you on this. > > > > > > > > Richard. > > > > > > > > > [1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440 > > > > > [2] https://www.ime.usp.br/~belinass/64cores-experiment.svg > > > > > [3] https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg > > > > > On Mon, Nov 19, 2018 at 8:53 AM Richard Biener > > > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinassi > > > > > > <giuliano.belina...@usp.br> wrote: > > > > > > > > > > > > > > Hi! Sorry for the late reply again :P > > > > > > > > > > > > > > On Thu, Nov 15, 2018 at 8:29 AM Richard Biener > > > > > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > > > > > On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin > > > > > > > > Belinassi > > > > > > > > <giuliano.belina...@usp.br> wrote: > > > > > > > > > > > > > > > > > > As a brief introduction, I am a graduate student that got > > > > > > > > > interested > > > > > > > > > > > > > > > > > > in the "Parallelize the compilation using threads"(GSoC 2018 > > > > > > > > > [1]). I > > > > > > > > > am a newcommer in GCC, but already have sent some patches, > > > > > > > > > some of > > > > > > > > > them have already been accepted [2]. > > > > > > > > > > > > > > > > > > I brought this subject up in IRC, but maybe here is a proper > > > > > > > > > place to > > > > > > > > > discuss this topic. > > > > > > > > > > > > > > > > > > From my point of view, parallelizing GCC itself will only > > > > > > > > > speed up the > > > > > > > > > compilation of projects which have a big file that creates a > > > > > > > > > bottleneck in the whole project compilation (note: by big, I > > > > > > > > > mean the > > > > > > > > > amount of code to generate). > > > > > > > > > > > > > > > > That's true. During GCC bootstrap there are some of those (see > > > > > > > > PR84402). > > > > > > > > > > > > > > > > > > > > > > > One way to improve parallelism is to use link-time optimization > > > > > > > > where > > > > > > > > even single source files can be split up into multiple > > > > > > > > link-time units. But > > > > > > > > then there's the serial whole-program analysis part. > > > > > > > > > > > > > > Did you mean this: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 ? > > > > > > > That is a lot of data :-) > > > > > > > > > > > > > > It seems that 'phase opt and generate' is the most time-consuming > > > > > > > part. Is that the 'GIMPLE optimization pipeline' you were talking > > > > > > > about in this thread: > > > > > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html > > > > > > > > > > > > It's everything that comes after the frontend parsing bits, thus > > > > > > this > > > > > > includes in particular RTL optimization and early GIMPLE > > > > > > optimizations. > > > > > > > > > > > > > > > Additionally, I know that GCC must not > > > > > > > > > change the project layout, but from the software engineering > > > > > > > > > perspective, > > > > > > > > > this may be a bad smell that indicates that the file should > > > > > > > > > be broken > > > > > > > > > into smaller files. Finally, the Makefiles will take care of > > > > > > > > > the > > > > > > > > > parallelization task. > > > > > > > > > > > > > > > > What do you mean by GCC must not change the project layout? GCC > > > > > > > > happily re-orders functions and link-time optimization will > > > > > > > > reorder > > > > > > > > TUs (well, linking may as well). > > > > > > > > > > > > > > > > > > > > > > That was a response to a comment made on IRC: > > > > > > > > > > > > > > On Thu, Nov 15, 2018 at 9:44 AM Jonathan Wakely > > > > > > > <jwakely....@gmail.com> wrote: > > > > > > > >I think this is in response to a comment I made on IRC. Giuliano > > > > > > > >said > > > > > > > >that if a project has a very large file that dominates the total > > > > > > > >build > > > > > > > >time, the file should be split up into smaller pieces. I said > > > > > > > >"GCC > > > > > > > >can't restructure people's code. it can only try to compile it > > > > > > > >faster". We weren't referring to code transformations in the > > > > > > > >compiler > > > > > > > >like re-ordering functions, but physically refactoring the source > > > > > > > >code. > > > > > > > > > > > > > > Yes. But from one of the attachments from PR84402, it seems that > > > > > > > such > > > > > > > files exist on GCC, > > > > > > > https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440 > > > > > > > > > > > > > > > > My questions are: > > > > > > > > > > > > > > > > > > 1. Is there any project compilation that will significantly > > > > > > > > > be improved > > > > > > > > > if GCC runs in parallel? Do someone has data about something > > > > > > > > > related > > > > > > > > > to that? How about the Linux Kernel? If not, I can try to > > > > > > > > > bring some. > > > > > > > > > > > > > > > > We do not have any data about this apart from experiments with > > > > > > > > splitting up source files for PR84402. > > > > > > > > > > > > > > > > > 2. Did I correctly understand the goal of the > > > > > > > > > parallelization? Can > > > > > > > > > anyone provide extra details to me? > > > > > > > > > > > > > > > > You may want to search the mailing list archives since we had a > > > > > > > > student application (later revoked) for the task with some > > > > > > > > discussion. > > > > > > > > > > > > > > > > In my view (I proposed the thing) the most interesting parts are > > > > > > > > getting GCCs global state documented and reduced. The > > > > > > > > parallelization > > > > > > > > itself is an interesting experiment but whether there will be > > > > > > > > any > > > > > > > > substantial improvement for builds that can already benefit > > > > > > > > from make > > > > > > > > parallelism remains a question. > > > > > > > > > > > > > > As I agree that documenting GCC's global states is good for the > > > > > > > community and the development of GCC, I really don't think this a > > > > > > > good > > > > > > > motivation for parallelizing a compiler from a research > > > > > > > standpoint. > > > > > > > > > > > > True ;) Note that my suggestions to the other GSoC student were > > > > > > purely based on where it's easiest to experiment with > > > > > > paralellization > > > > > > and not where it would be most beneficial. > > > > > > > > > > > > > There must be something or someone that could take advantage of > > > > > > > the > > > > > > > fine-grained parallelism. But that data from PR84402 seems to > > > > > > > have the > > > > > > > answer to it. :-) > > > > > > > > > > > > > > On Thu, Nov 15, 2018 at 4:07 PM Szabolcs Nagy > > > > > > > <szabolcs.n...@arm.com> wrote: > > > > > > > > > > > > > > > > On 15/11/18 10:29, Richard Biener wrote: > > > > > > > > > In my view (I proposed the thing) the most interesting parts > > > > > > > > > are > > > > > > > > > getting GCCs global state documented and reduced. The > > > > > > > > > parallelization > > > > > > > > > itself is an interesting experiment but whether there will be > > > > > > > > > any > > > > > > > > > substantial improvement for builds that can already benefit > > > > > > > > > from make > > > > > > > > > parallelism remains a question. > > > > > > > > > > > > > > > > in the common case (project with many small files, much more > > > > > > > > than > > > > > > > > core count) i'd expect a regression: > > > > > > > > > > > > > > > > if gcc itself tries to parallelize that introduces inter thread > > > > > > > > synchronization and potential false sharing in gcc (e.g. malloc > > > > > > > > locks) that does not exist with make parallelism (glibc can > > > > > > > > avoid > > > > > > > > some atomic instructions when a process is single threaded). > > > > > > > > > > > > > > That is what I am mostly worried about. Or the most costly part > > > > > > > is not > > > > > > > parallelizable at all. Also, I would expect a regression on very > > > > > > > small > > > > > > > files, which probably could be avoided implementing this feature > > > > > > > as a > > > > > > > flag? > > > > > > > > > > > > I think the the issue should be avoided by avoiding fine-grained > > > > > > paralellism. > > > > > > Which might be somewhat hard given there are core data structures > > > > > > that > > > > > > are shared (the memory allocator for a start). > > > > > > > > > > > > The other issue I am more worried about is that we probably have to > > > > > > interact with make somehow so that we do not end up with 64 threads > > > > > > when one does -j8 on a 8 core machine. That's basically the same > > > > > > issue we run into with -flto and it's threaded WPA writeout or > > > > > > recursive > > > > > > invocation of make. > > > > > > > > > > > > > > > > > > > > On Fri, Nov 16, 2018 at 11:05 AM Martin Jambor <mjam...@suse.cz> > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi Giuliano, > > > > > > > > > > > > > > > > On Thu, Nov 15 2018, Richard Biener wrote: > > > > > > > > > You may want to search the mailing list archives since we had > > > > > > > > > a > > > > > > > > > student application (later revoked) for the task with some > > > > > > > > > discussion. > > > > > > > > > > > > > > > > Specifically, the whole thread beginning with > > > > > > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00179.html > > > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > Yes, I will research this carefully ;-) > > > > > > > > > > > > > > Thank you