On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote: > > While a plain Firefox -flto build works fine. LTO/PGO build fails with: > > > > lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 > > 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) > > ../../gcc/gcc/ipa-utils.c:540 > > 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) > > ../../gcc/gcc/ipa-icf.c:753 > > 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) > > ../../gcc/gcc/ipa-icf.c:2706 > > 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() > > ../../gcc/gcc/ipa-icf.c:2098 > > 0xf1d3f1 ipa_icf_driver > > ../../gcc/gcc/ipa-icf.c:2784 > > 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) > > ../../gcc/gcc/ipa-icf.c:2831 > > > > > > The pass is also very memory hungry (from 3GB without ICF to 4GB during > > libxul link), while the code size savings are in the 1% range. > > Thnks for checking. I was just thinking about doing that myself. Would > you mind posting -ftime-report of firefox WPA stage?
(without ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 58.38 (63%) usr 2.00 (47%) sys 60.37 (40%) wall 403069 kB (12%) ggc phase stream in : 30.24 (33%) usr 0.97 (23%) sys 33.90 (22%) wall 2944210 kB (88%) ggc phase stream out : 4.29 ( 5%) usr 1.32 (31%) sys 57.32 (38%) wall 0 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 3.68 ( 4%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall 0 kB ( 0%) ggc callgraph optimization : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall 166 kB ( 0%) ggc ipa dead code removal : 6.91 ( 7%) usr 0.08 ( 2%) sys 7.25 ( 5%) wall 0 kB ( 0%) ggc ipa virtual call target : 7.08 ( 8%) usr 0.04 ( 1%) sys 6.93 ( 5%) wall 0 kB ( 0%) ggc ipa devirtualization : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 10365 kB ( 0%) ggc ipa cp : 1.81 ( 2%) usr 0.06 ( 1%) sys 3.40 ( 2%) wall 173701 kB ( 5%) ggc ipa inlining heuristics : 16.60 (18%) usr 0.27 ( 6%) sys 17.48 (12%) wall 532704 kB (16%) ggc ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.21 ( 0%) usr 0.04 ( 1%) sys 0.97 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 18.29 (20%) usr 0.54 (13%) sys 18.96 (12%) wall 2226088 kB (66%) ggc ipa lto decl out : 3.93 ( 4%) usr 0.13 ( 3%) sys 4.06 ( 3%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.24 ( 0%) usr 0.03 ( 1%) sys 0.59 ( 0%) wall 14226 kB ( 0%) ggc ipa lto constructors out: 0.08 ( 0%) usr 0.04 ( 1%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.89 ( 1%) usr 0.12 ( 3%) sys 1.02 ( 1%) wall 364151 kB (11%) ggc ipa lto decl merge : 2.14 ( 2%) usr 0.01 ( 0%) sys 2.14 ( 1%) wall 8196 kB ( 0%) ggc ipa lto cgraph merge : 1.59 ( 2%) usr 0.00 ( 0%) sys 1.60 ( 1%) wall 12716 kB ( 0%) ggc whopr wpa : 1.54 ( 2%) usr 0.03 ( 1%) sys 1.55 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.04 ( 0%) usr 1.11 (26%) sys 52.10 (34%) wall 0 kB ( 0%) ggc whopr partitioning : 5.02 ( 5%) usr 0.01 ( 0%) sys 5.03 ( 3%) wall 4938 kB ( 0%) ggc ipa reference : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.08 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 2.43 ( 3%) usr 0.02 ( 0%) sys 2.49 ( 2%) wall 0 kB ( 0%) ggc tree STMT verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 16.31 (18%) usr 1.69 (39%) sys 17.96 (12%) wall 0 kB ( 0%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc varconst : 0.01 ( 0%) usr 0.03 ( 1%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 92.91 4.29 151.73 3348693 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. (with ICF) Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 82.70 (70%) usr 3.31 (53%) sys 86.17 (45%) wall 1468975 kB (33%) ggc phase stream in : 30.46 (26%) usr 1.02 (16%) sys 31.48 (16%) wall 2944210 kB (67%) ggc phase stream out : 4.52 ( 4%) usr 1.90 (30%) sys 73.47 (38%) wall 12 kB ( 0%) ggc phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc garbage collection : 7.01 ( 6%) usr 0.00 ( 0%) sys 6.99 ( 4%) wall 0 kB ( 0%) ggc callgraph optimization : 0.49 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall 166 kB ( 0%) ggc ipa dead code removal : 6.98 ( 6%) usr 0.13 ( 2%) sys 6.89 ( 4%) wall 0 kB ( 0%) ggc ipa virtual call target : 6.93 ( 6%) usr 0.03 ( 0%) sys 7.20 ( 4%) wall 6 kB ( 0%) ggc ipa devirtualization : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 10365 kB ( 0%) ggc ipa cp : 1.87 ( 2%) usr 0.11 ( 2%) sys 2.00 ( 1%) wall 167204 kB ( 4%) ggc ipa inlining heuristics : 17.15 (15%) usr 0.21 ( 3%) sys 17.35 ( 9%) wall 512636 kB (12%) ggc ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple in : 5.17 ( 4%) usr 1.04 (17%) sys 6.51 ( 3%) wall 855058 kB (19%) ggc ipa lto gimple out : 0.38 ( 0%) usr 0.08 ( 1%) sys 3.07 ( 2%) wall 12 kB ( 0%) ggc ipa lto decl in : 18.38 (16%) usr 0.56 ( 9%) sys 18.95 (10%) wall 2226088 kB (50%) ggc ipa lto decl out : 3.95 ( 3%) usr 0.08 ( 1%) sys 4.03 ( 2%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.29 ( 0%) usr 0.01 ( 0%) sys 0.29 ( 0%) wall 14389 kB ( 0%) ggc ipa lto constructors out: 0.10 ( 0%) usr 0.03 ( 0%) sys 0.58 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.91 ( 1%) usr 0.10 ( 2%) sys 1.02 ( 1%) wall 364151 kB ( 8%) ggc ipa lto decl merge : 2.14 ( 2%) usr 0.00 ( 0%) sys 2.14 ( 1%) wall 8196 kB ( 0%) ggc ipa lto cgraph merge : 1.65 ( 1%) usr 0.01 ( 0%) sys 1.66 ( 1%) wall 12716 kB ( 0%) ggc whopr wpa : 1.81 ( 2%) usr 0.01 ( 0%) sys 1.85 ( 1%) wall 1 kB ( 0%) ggc whopr wpa I/O : 0.05 ( 0%) usr 1.71 (27%) sys 65.75 (34%) wall 0 kB ( 0%) ggc whopr partitioning : 5.05 ( 4%) usr 0.00 ( 0%) sys 5.06 ( 3%) wall 5012 kB ( 0%) ggc ipa reference : 2.13 ( 2%) usr 0.03 ( 0%) sys 2.16 ( 1%) wall 0 kB ( 0%) ggc ipa profile : 0.32 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 2.57 ( 2%) usr 0.00 ( 0%) sys 2.56 ( 1%) wall 0 kB ( 0%) ggc ipa icf : 6.88 ( 6%) usr 0.08 ( 1%) sys 7.01 ( 4%) wall 855 kB ( 0%) ggc tree SSA rewrite : 0.23 ( 0%) usr 0.06 ( 1%) sys 0.28 ( 0%) wall 33946 kB ( 1%) ggc tree SSA incremental : 0.42 ( 0%) usr 0.05 ( 1%) sys 0.53 ( 0%) wall 21099 kB ( 0%) ggc tree operand scan : 0.47 ( 0%) usr 0.08 ( 1%) sys 0.34 ( 0%) wall 181275 kB ( 4%) ggc tree STMT verifier : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc callgraph verifier : 22.76 (19%) usr 1.68 (27%) sys 24.44 (13%) wall 0 kB ( 0%) ggc dominance frontiers : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.19 ( 0%) usr 0.05 ( 1%) sys 0.25 ( 0%) wall 0 kB ( 0%) ggc varconst : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc loop fini : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 0.82 ( 1%) usr 0.00 ( 0%) sys 0.81 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 117.68 6.23 191.15 4414612 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. > It seems that in this case we reject too many of equality candidates? > It think the original numbers was about 4-5% but later some equivalences was > disabled because of devirt/aliasing issues. Do you compare it with gold ICF > enabled? There are quite few obvious improvements to the analysis that can > be done, but I guess we need to analyze the interesting cases one by one. Gold ICF was enabled (-Wl,--icf=all,--icf-iterations=3). -- Markus