On Fri, Apr 22, 2011 at 1:58 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > I run the patch on Mozilla. W/o the patch it is: > Execution times (seconds) > garbage collection : 20.19 ( 3%) usr 0.02 ( 0%) sys 20.22 ( 3%) wall > 0 kB ( 0%) ggc > callgraph optimization: 3.53 ( 1%) usr 0.01 ( 0%) sys 3.53 ( 1%) wall > 15248 kB ( 1%) ggc > varpool construction : 0.77 ( 0%) usr 0.02 ( 0%) sys 0.80 ( 0%) wall > 51607 kB ( 4%) ggc > ipa cp : 2.12 ( 0%) usr 0.10 ( 1%) sys 2.23 ( 0%) wall > 119701 kB (10%) ggc > ipa lto gimple in : 0.07 ( 0%) usr 0.02 ( 0%) sys 0.07 ( 0%) wall > 0 kB ( 0%) ggc > ipa lto gimple out : 11.63 ( 2%) usr 1.01 ( 8%) sys 12.63 ( 2%) wall > 0 kB ( 0%) ggc > ipa lto decl in : 182.15 (28%) usr 4.06 (32%) sys 188.10 (28%) wall > 392863 kB (31%) ggc > ipa lto decl out : 149.86 (23%) usr 0.32 ( 3%) sys 150.25 (22%) wall > 0 kB ( 0%) ggc > ipa lto decl init I/O : 0.14 ( 0%) usr 0.03 ( 0%) sys 0.16 ( 0%) wall > 31 kB ( 0%) ggc > ipa lto cgraph I/O : 2.09 ( 0%) usr 0.27 ( 2%) sys 2.37 ( 0%) wall > 428623 kB (34%) ggc > ipa lto decl merge : 219.70 (33%) usr 1.93 (15%) sys 221.75 (33%) wall > 162687 kB (13%) ggc > ipa lto cgraph merge : 2.68 ( 0%) usr 0.00 ( 0%) sys 2.69 ( 0%) wall > 15895 kB ( 1%) ggc > whopr wpa : 1.65 ( 0%) usr 0.04 ( 0%) sys 1.71 ( 0%) wall > 1 kB ( 0%) ggc > whopr wpa I/O : 2.20 ( 0%) usr 4.55 (36%) sys 7.20 ( 1%) wall > 0 kB ( 0%) ggc > ipa reference : 4.12 ( 1%) usr 0.00 ( 0%) sys 4.09 ( 1%) wall > 0 kB ( 0%) ggc > ipa profile : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall > 0 kB ( 0%) ggc > ipa pure const : 3.15 ( 0%) usr 0.04 ( 0%) sys 3.19 ( 0%) wall > 0 kB ( 0%) ggc > parser : 1.56 ( 0%) usr 0.00 ( 0%) sys 1.56 ( 0%) wall > 37684 kB ( 3%) ggc > inline heuristics : 47.26 ( 7%) usr 0.05 ( 0%) sys 47.33 ( 7%) wall > 21988 kB ( 2%) ggc > callgraph verifier : 0.42 ( 0%) usr 0.04 ( 0%) sys 0.47 ( 0%) wall > 0 kB ( 0%) ggc > varconst : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall > 0 kB ( 0%) ggc > unaccounted todo : 1.19 ( 0%) usr 0.00 ( 0%) sys 1.17 ( 0%) wall > 0 kB ( 0%) ggc > TOTAL : 657.07 12.64 672.26 > 1247550 kB > > note that total GGC use seems obviously wrong. The peak GGC report reads: {GC > 4079042k -> 4043085k} > > with the patch > Execution times (seconds) > garbage collection : 13.85 ( 3%) usr 0.02 ( 0%) sys 13.88 ( 3%) wall > 0 kB ( 0%) ggc > callgraph optimization: 2.40 ( 0%) usr 0.00 ( 0%) sys 2.40 ( 0%) wall > 15248 kB ( 1%) ggc > varpool construction : 0.69 ( 0%) usr 0.03 ( 0%) sys 0.71 ( 0%) wall > 51621 kB ( 4%) ggc > ipa cp : 1.86 ( 0%) usr 0.11 ( 1%) sys 1.97 ( 0%) wall > 119697 kB ( 9%) ggc > ipa lto gimple in : 0.04 ( 0%) usr 0.02 ( 0%) sys 0.06 ( 0%) wall > 0 kB ( 0%) ggc > ipa lto gimple out : 11.86 ( 2%) usr 0.92 ( 9%) sys 12.80 ( 2%) wall > 0 kB ( 0%) ggc > ipa lto decl in : 287.52 (54%) usr 3.49 (35%) sys 291.13 (54%) wall > 713694 kB (51%) ggc > ipa lto decl out : 127.76 (24%) usr 0.94 ( 9%) sys 128.79 (24%) wall > 0 kB ( 0%) ggc > ipa lto decl init I/O : 0.13 ( 0%) usr 0.02 ( 0%) sys 0.15 ( 0%) wall > 31 kB ( 0%) ggc > ipa lto cgraph I/O : 1.66 ( 0%) usr 0.29 ( 3%) sys 1.94 ( 0%) wall > 428623 kB (30%) ggc > ipa lto decl merge : 18.12 ( 3%) usr 0.13 ( 1%) sys 18.26 ( 3%) wall > 978 kB ( 0%) ggc > ipa lto cgraph merge : 1.90 ( 0%) usr 0.00 ( 0%) sys 1.91 ( 0%) wall > 15143 kB ( 1%) ggc > whopr wpa : 1.99 ( 0%) usr 0.05 ( 0%) sys 2.01 ( 0%) wall > 1 kB ( 0%) ggc > whopr wpa I/O : 2.40 ( 0%) usr 3.77 (38%) sys 6.47 ( 1%) wall > 0 kB ( 0%) ggc > ipa reference : 4.56 ( 1%) usr 0.00 ( 0%) sys 4.58 ( 1%) wall > 0 kB ( 0%) ggc > ipa profile : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall > 0 kB ( 0%) ggc > ipa pure const : 3.33 ( 1%) usr 0.03 ( 0%) sys 3.36 ( 1%) wall > 0 kB ( 0%) ggc > parser : 1.85 ( 0%) usr 0.03 ( 0%) sys 1.87 ( 0%) wall > 37684 kB ( 3%) ggc > inline heuristics : 47.34 ( 9%) usr 0.04 ( 0%) sys 47.42 ( 9%) wall > 21988 kB ( 2%) ggc > tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall > 1 kB ( 0%) ggc > callgraph verifier : 0.45 ( 0%) usr 0.05 ( 0%) sys 0.55 ( 0%) wall > 0 kB ( 0%) ggc > varconst : 0.00 ( 0%) usr 0.03 ( 0%) sys 0.03 ( 0%) wall > 0 kB ( 0%) ggc > unaccounted todo : 1.38 ( 0%) usr 0.00 ( 0%) sys 1.37 ( 0%) wall > 0 kB ( 0%) ggc > TOTAL : 531.66 10.05 542.31 > 1405930 kB > > and peak memory use 2688637k -> 2136908k. So 50% GGC memory (we need another > about 4G for non-GGC memory, probaly largely in mmap pool) and 23% compile > time > improvements. > > So great job! And as a note for myself, the inliner facelifting made it 3.5 > times > slower here. It is obviously because of recomputing badness. I do have plan > for this. > > Note that this is non-debugging build. We are stil way above my original > results from gcc summit paper that was > TOTAL : 186.41 8.27 195.10 > 3491946 kB > I think most slowdown was caused by making free-lang-data to not free stuff > that might make dwarf2out ICE. DECL in was 48s, merge 45s, decl out 48s, > inliner 15s.
Yes, that's very likely. If we'd get around to re-do the LTO option saving code we might want to forbid -g0 compile and -g link (dropping -g at link time as soon as we see a single module compiled with -g0). Then we can free some more stuff, at least with -g0 - though I'm not sure -g0 matters in practice. Maybe we can shift numbers back from ipa lto decl in to ipa lto decl merge by some timevar adjustments? Richard. > Honza >