Hi, I run the patch on Mozilla. W/o the patch it is: Execution times (seconds) garbage collection : 20.19 ( 3%) usr 0.02 ( 0%) sys 20.22 ( 3%) wall 0 kB ( 0%) ggc callgraph optimization: 3.53 ( 1%) usr 0.01 ( 0%) sys 3.53 ( 1%) wall 15248 kB ( 1%) ggc varpool construction : 0.77 ( 0%) usr 0.02 ( 0%) sys 0.80 ( 0%) wall 51607 kB ( 4%) ggc ipa cp : 2.12 ( 0%) usr 0.10 ( 1%) sys 2.23 ( 0%) wall 119701 kB (10%) ggc ipa lto gimple in : 0.07 ( 0%) usr 0.02 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 11.63 ( 2%) usr 1.01 ( 8%) sys 12.63 ( 2%) wall 0 kB ( 0%) ggc ipa lto decl in : 182.15 (28%) usr 4.06 (32%) sys 188.10 (28%) wall 392863 kB (31%) ggc ipa lto decl out : 149.86 (23%) usr 0.32 ( 3%) sys 150.25 (22%) wall 0 kB ( 0%) ggc ipa lto decl init I/O : 0.14 ( 0%) usr 0.03 ( 0%) sys 0.16 ( 0%) wall 31 kB ( 0%) ggc ipa lto cgraph I/O : 2.09 ( 0%) usr 0.27 ( 2%) sys 2.37 ( 0%) wall 428623 kB (34%) ggc ipa lto decl merge : 219.70 (33%) usr 1.93 (15%) sys 221.75 (33%) wall 162687 kB (13%) ggc ipa lto cgraph merge : 2.68 ( 0%) usr 0.00 ( 0%) sys 2.69 ( 0%) wall 15895 kB ( 1%) ggc whopr wpa : 1.65 ( 0%) usr 0.04 ( 0%) sys 1.71 ( 0%) wall 1 kB ( 0%) ggc whopr wpa I/O : 2.20 ( 0%) usr 4.55 (36%) sys 7.20 ( 1%) wall 0 kB ( 0%) ggc ipa reference : 4.12 ( 1%) usr 0.00 ( 0%) sys 4.09 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 3.15 ( 0%) usr 0.04 ( 0%) sys 3.19 ( 0%) wall 0 kB ( 0%) ggc parser : 1.56 ( 0%) usr 0.00 ( 0%) sys 1.56 ( 0%) wall 37684 kB ( 3%) ggc inline heuristics : 47.26 ( 7%) usr 0.05 ( 0%) sys 47.33 ( 7%) wall 21988 kB ( 2%) ggc callgraph verifier : 0.42 ( 0%) usr 0.04 ( 0%) sys 0.47 ( 0%) wall 0 kB ( 0%) ggc varconst : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 1.19 ( 0%) usr 0.00 ( 0%) sys 1.17 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 657.07 12.64 672.26 1247550 kB
note that total GGC use seems obviously wrong. The peak GGC report reads: {GC 4079042k -> 4043085k} with the patch Execution times (seconds) garbage collection : 13.85 ( 3%) usr 0.02 ( 0%) sys 13.88 ( 3%) wall 0 kB ( 0%) ggc callgraph optimization: 2.40 ( 0%) usr 0.00 ( 0%) sys 2.40 ( 0%) wall 15248 kB ( 1%) ggc varpool construction : 0.69 ( 0%) usr 0.03 ( 0%) sys 0.71 ( 0%) wall 51621 kB ( 4%) ggc ipa cp : 1.86 ( 0%) usr 0.11 ( 1%) sys 1.97 ( 0%) wall 119697 kB ( 9%) ggc ipa lto gimple in : 0.04 ( 0%) usr 0.02 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 11.86 ( 2%) usr 0.92 ( 9%) sys 12.80 ( 2%) wall 0 kB ( 0%) ggc ipa lto decl in : 287.52 (54%) usr 3.49 (35%) sys 291.13 (54%) wall 713694 kB (51%) ggc ipa lto decl out : 127.76 (24%) usr 0.94 ( 9%) sys 128.79 (24%) wall 0 kB ( 0%) ggc ipa lto decl init I/O : 0.13 ( 0%) usr 0.02 ( 0%) sys 0.15 ( 0%) wall 31 kB ( 0%) ggc ipa lto cgraph I/O : 1.66 ( 0%) usr 0.29 ( 3%) sys 1.94 ( 0%) wall 428623 kB (30%) ggc ipa lto decl merge : 18.12 ( 3%) usr 0.13 ( 1%) sys 18.26 ( 3%) wall 978 kB ( 0%) ggc ipa lto cgraph merge : 1.90 ( 0%) usr 0.00 ( 0%) sys 1.91 ( 0%) wall 15143 kB ( 1%) ggc whopr wpa : 1.99 ( 0%) usr 0.05 ( 0%) sys 2.01 ( 0%) wall 1 kB ( 0%) ggc whopr wpa I/O : 2.40 ( 0%) usr 3.77 (38%) sys 6.47 ( 1%) wall 0 kB ( 0%) ggc ipa reference : 4.56 ( 1%) usr 0.00 ( 0%) sys 4.58 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 3.33 ( 1%) usr 0.03 ( 0%) sys 3.36 ( 1%) wall 0 kB ( 0%) ggc parser : 1.85 ( 0%) usr 0.03 ( 0%) sys 1.87 ( 0%) wall 37684 kB ( 3%) ggc inline heuristics : 47.34 ( 9%) usr 0.04 ( 0%) sys 47.42 ( 9%) wall 21988 kB ( 2%) ggc tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1 kB ( 0%) ggc callgraph verifier : 0.45 ( 0%) usr 0.05 ( 0%) sys 0.55 ( 0%) wall 0 kB ( 0%) ggc varconst : 0.00 ( 0%) usr 0.03 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 1.38 ( 0%) usr 0.00 ( 0%) sys 1.37 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 531.66 10.05 542.31 1405930 kB and peak memory use 2688637k -> 2136908k. So 50% GGC memory (we need another about 4G for non-GGC memory, probaly largely in mmap pool) and 23% compile time improvements. So great job! And as a note for myself, the inliner facelifting made it 3.5 times slower here. It is obviously because of recomputing badness. I do have plan for this. Note that this is non-debugging build. We are stil way above my original results from gcc summit paper that was TOTAL : 186.41 8.27 195.10 3491946 kB I think most slowdown was caused by making free-lang-data to not free stuff that might make dwarf2out ICE. DECL in was 48s, merge 45s, decl out 48s, inliner 15s. Honza