On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote:
> > While a plain Firefox -flto build works fine. LTO/PGO build fails with:
> >
> > lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
> > 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
> > ../../gcc/gcc/ipa-utils.c:540
> > 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
> > ../../gcc/gcc/ipa-icf.c:753
> > 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
> > ../../gcc/gcc/ipa-icf.c:2706
> > 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
> > ../../gcc/gcc/ipa-icf.c:2098
> > 0xf1d3f1 ipa_icf_driver
> > ../../gcc/gcc/ipa-icf.c:2784
> > 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
> > ../../gcc/gcc/ipa-icf.c:2831
> >
> >
> > The pass is also very memory hungry (from 3GB without ICF to 4GB during
> > libxul link), while the code size savings are in the 1% range.
>
> Thnks for checking. I was just thinking about doing that myself. Would
> you mind posting -ftime-report of firefox WPA stage?
(without ICF)
Execution times (seconds)
phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1412 kB ( 0%) ggc
phase opt and generate : 58.38 (63%) usr 2.00 (47%) sys 60.37 (40%) wall
403069 kB (12%) ggc
phase stream in : 30.24 (33%) usr 0.97 (23%) sys 33.90 (22%) wall
2944210 kB (88%) ggc
phase stream out : 4.29 ( 5%) usr 1.32 (31%) sys 57.32 (38%) wall
0 kB ( 0%) ggc
phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
0 kB ( 0%) ggc
garbage collection : 3.68 ( 4%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall
0 kB ( 0%) ggc
callgraph optimization : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall
166 kB ( 0%) ggc
ipa dead code removal : 6.91 ( 7%) usr 0.08 ( 2%) sys 7.25 ( 5%) wall
0 kB ( 0%) ggc
ipa virtual call target : 7.08 ( 8%) usr 0.04 ( 1%) sys 6.93 ( 5%) wall
0 kB ( 0%) ggc
ipa devirtualization : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall
10365 kB ( 0%) ggc
ipa cp : 1.81 ( 2%) usr 0.06 ( 1%) sys 3.40 ( 2%) wall
173701 kB ( 5%) ggc
ipa inlining heuristics : 16.60 (18%) usr 0.27 ( 6%) sys 17.48 (12%) wall
532704 kB (16%) ggc
ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
ipa lto gimple out : 0.21 ( 0%) usr 0.04 ( 1%) sys 0.97 ( 1%) wall
0 kB ( 0%) ggc
ipa lto decl in : 18.29 (20%) usr 0.54 (13%) sys 18.96 (12%) wall
2226088 kB (66%) ggc
ipa lto decl out : 3.93 ( 4%) usr 0.13 ( 3%) sys 4.06 ( 3%) wall
0 kB ( 0%) ggc
ipa lto constructors in : 0.24 ( 0%) usr 0.03 ( 1%) sys 0.59 ( 0%) wall
14226 kB ( 0%) ggc
ipa lto constructors out: 0.08 ( 0%) usr 0.04 ( 1%) sys 0.15 ( 0%) wall
0 kB ( 0%) ggc
ipa lto cgraph I/O : 0.89 ( 1%) usr 0.12 ( 3%) sys 1.02 ( 1%) wall
364151 kB (11%) ggc
ipa lto decl merge : 2.14 ( 2%) usr 0.01 ( 0%) sys 2.14 ( 1%) wall
8196 kB ( 0%) ggc
ipa lto cgraph merge : 1.59 ( 2%) usr 0.00 ( 0%) sys 1.60 ( 1%) wall
12716 kB ( 0%) ggc
whopr wpa : 1.54 ( 2%) usr 0.03 ( 1%) sys 1.55 ( 1%) wall
1 kB ( 0%) ggc
whopr wpa I/O : 0.04 ( 0%) usr 1.11 (26%) sys 52.10 (34%) wall
0 kB ( 0%) ggc
whopr partitioning : 5.02 ( 5%) usr 0.01 ( 0%) sys 5.03 ( 3%) wall
4938 kB ( 0%) ggc
ipa reference : 2.04 ( 2%) usr 0.02 ( 0%) sys 2.08 ( 1%) wall
0 kB ( 0%) ggc
ipa profile : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 2.43 ( 3%) usr 0.02 ( 0%) sys 2.49 ( 2%) wall
0 kB ( 0%) ggc
tree STMT verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
callgraph verifier : 16.31 (18%) usr 1.69 (39%) sys 17.96 (12%) wall
0 kB ( 0%) ggc
dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
varconst : 0.01 ( 0%) usr 0.03 ( 1%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
unaccounted todo : 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 92.91 4.29 151.73
3348693 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
(with ICF)
Execution times (seconds)
phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1412 kB ( 0%) ggc
phase opt and generate : 82.70 (70%) usr 3.31 (53%) sys 86.17 (45%) wall
1468975 kB (33%) ggc
phase stream in : 30.46 (26%) usr 1.02 (16%) sys 31.48 (16%) wall
2944210 kB (67%) ggc
phase stream out : 4.52 ( 4%) usr 1.90 (30%) sys 73.47 (38%) wall
12 kB ( 0%) ggc
phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
garbage collection : 7.01 ( 6%) usr 0.00 ( 0%) sys 6.99 ( 4%) wall
0 kB ( 0%) ggc
callgraph optimization : 0.49 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall
166 kB ( 0%) ggc
ipa dead code removal : 6.98 ( 6%) usr 0.13 ( 2%) sys 6.89 ( 4%) wall
0 kB ( 0%) ggc
ipa virtual call target : 6.93 ( 6%) usr 0.03 ( 0%) sys 7.20 ( 4%) wall
6 kB ( 0%) ggc
ipa devirtualization : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
10365 kB ( 0%) ggc
ipa cp : 1.87 ( 2%) usr 0.11 ( 2%) sys 2.00 ( 1%) wall
167204 kB ( 4%) ggc
ipa inlining heuristics : 17.15 (15%) usr 0.21 ( 3%) sys 17.35 ( 9%) wall
512636 kB (12%) ggc
ipa comdats : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
ipa lto gimple in : 5.17 ( 4%) usr 1.04 (17%) sys 6.51 ( 3%) wall
855058 kB (19%) ggc
ipa lto gimple out : 0.38 ( 0%) usr 0.08 ( 1%) sys 3.07 ( 2%) wall
12 kB ( 0%) ggc
ipa lto decl in : 18.38 (16%) usr 0.56 ( 9%) sys 18.95 (10%) wall
2226088 kB (50%) ggc
ipa lto decl out : 3.95 ( 3%) usr 0.08 ( 1%) sys 4.03 ( 2%) wall
0 kB ( 0%) ggc
ipa lto constructors in : 0.29 ( 0%) usr 0.01 ( 0%) sys 0.29 ( 0%) wall
14389 kB ( 0%) ggc
ipa lto constructors out: 0.10 ( 0%) usr 0.03 ( 0%) sys 0.58 ( 0%) wall
0 kB ( 0%) ggc
ipa lto cgraph I/O : 0.91 ( 1%) usr 0.10 ( 2%) sys 1.02 ( 1%) wall
364151 kB ( 8%) ggc
ipa lto decl merge : 2.14 ( 2%) usr 0.00 ( 0%) sys 2.14 ( 1%) wall
8196 kB ( 0%) ggc
ipa lto cgraph merge : 1.65 ( 1%) usr 0.01 ( 0%) sys 1.66 ( 1%) wall
12716 kB ( 0%) ggc
whopr wpa : 1.81 ( 2%) usr 0.01 ( 0%) sys 1.85 ( 1%) wall
1 kB ( 0%) ggc
whopr wpa I/O : 0.05 ( 0%) usr 1.71 (27%) sys 65.75 (34%) wall
0 kB ( 0%) ggc
whopr partitioning : 5.05 ( 4%) usr 0.00 ( 0%) sys 5.06 ( 3%) wall
5012 kB ( 0%) ggc
ipa reference : 2.13 ( 2%) usr 0.03 ( 0%) sys 2.16 ( 1%) wall
0 kB ( 0%) ggc
ipa profile : 0.32 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 2.57 ( 2%) usr 0.00 ( 0%) sys 2.56 ( 1%) wall
0 kB ( 0%) ggc
ipa icf : 6.88 ( 6%) usr 0.08 ( 1%) sys 7.01 ( 4%) wall
855 kB ( 0%) ggc
tree SSA rewrite : 0.23 ( 0%) usr 0.06 ( 1%) sys 0.28 ( 0%) wall
33946 kB ( 1%) ggc
tree SSA incremental : 0.42 ( 0%) usr 0.05 ( 1%) sys 0.53 ( 0%) wall
21099 kB ( 0%) ggc
tree operand scan : 0.47 ( 0%) usr 0.08 ( 1%) sys 0.34 ( 0%) wall
181275 kB ( 4%) ggc
tree STMT verifier : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
callgraph verifier : 22.76 (19%) usr 1.68 (27%) sys 24.44 (13%) wall
0 kB ( 0%) ggc
dominance frontiers : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.19 ( 0%) usr 0.05 ( 1%) sys 0.25 ( 0%) wall
0 kB ( 0%) ggc
varconst : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
loop fini : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
unaccounted todo : 0.82 ( 1%) usr 0.00 ( 0%) sys 0.81 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 117.68 6.23 191.15
4414612 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
> It seems that in this case we reject too many of equality candidates?
> It think the original numbers was about 4-5% but later some equivalences was
> disabled because of devirt/aliasing issues. Do you compare it with gold ICF
> enabled? There are quite few obvious improvements to the analysis that can
> be done, but I guess we need to analyze the interesting cases one by one.
Gold ICF was enabled (-Wl,--icf=all,--icf-iterations=3).
--
Markus