> > I am giving the patch brief benchmarking on profiledbootstrap and it it
> > won't
> > cause major regression, I think we should go ahead with the patch.
Uhm, I profiledbootstrapped and we bit too fast to get resonable oprofile.
What I get is:
7443 9.4372 lto1 lto1
lto_end_uncompression(lto_compression_stream*)
4438 5.6271 lto1 lto1
_ZL14DFS_write_treeP12output_blockP4sccsP9tree_nodebb.lto_priv.4993
2351 2.9809 lto1 lto1
lto_output_tree(output_block*, tree_node*, bool, bool)
2179 2.7628 lto1 lto1
_ZL30linemap_macro_loc_to_exp_pointP9line_mapsjPPK8line_map.lto_priv.7860
1910 2.4217 lto1 lto1
_ZL19unpack_value_fieldsP7data_inP9bitpack_dP9tree_node.lto_priv.7292
1855 2.3520 libc-2.11.1.so libc-2.11.1.so
msort_with_tmp
1531 1.9412 lto1 lto1
streamer_string_index(output_block*, char const*, unsigned int, bool)
1530 1.9399 libc-2.11.1.so libc-2.11.1.so _int_malloc
1471 1.8651 lto1 lto1
do_estimate_growth(cgraph_node*)
1306 1.6559 lto1 lto1
pointer_map_insert(pointer_map_t*, void const*)
1238 1.5697 lto1 lto1
_Z28streamer_pack_tree_bitfieldsP12output_blockP9bitpack_dP9tree_node.constprop.1086
1138 1.4429 lto1 lto1
compare_tree_sccs_1(tree_node*, tree_node*, tree_node***)
1082 1.3719 lto1 lto1
streamer_write_tree_body(output_block*, tree_node*, bool)
1044 1.3237 lto1 lto1
_ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_j3vecIP9tree_node7va_heap6vl_ptrES7_S2_IP21ipa_agg_jump_function
We take 12 seconds of WPA on GCC (with my fork patch)
Execution times (seconds)
phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1412 kB ( 0%) ggc
phase opt and generate : 4.48 (37%) usr 0.05 ( 6%) sys 4.57 (34%) wall
42983 kB ( 7%) ggc
phase stream in : 7.21 (60%) usr 0.26 (32%) sys 7.47 (56%) wall
565102 kB (93%) ggc
phase stream out : 0.38 ( 3%) usr 0.50 (62%) sys 1.37 (10%) wall
623 kB ( 0%) ggc
callgraph optimization : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
6 kB ( 0%) ggc
ipa dead code removal : 0.46 ( 4%) usr 0.00 ( 0%) sys 0.46 ( 3%) wall
0 kB ( 0%) ggc
ipa cp : 0.36 ( 3%) usr 0.01 ( 1%) sys 0.41 ( 3%) wall
38261 kB ( 6%) ggc
ipa inlining heuristics : 2.84 (24%) usr 0.05 ( 6%) sys 2.87 (21%) wall
60263 kB (10%) ggc
ipa lto gimple in : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
ipa lto gimple out : 0.04 ( 0%) usr 0.02 ( 2%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
ipa lto decl in : 6.23 (52%) usr 0.18 (22%) sys 6.40 (48%) wall
425731 kB (70%) ggc
ipa lto decl out : 0.09 ( 1%) usr 0.01 ( 1%) sys 0.10 ( 1%) wall
0 kB ( 0%) ggc
ipa lto cgraph I/O : 0.22 ( 2%) usr 0.02 ( 2%) sys 0.25 ( 2%) wall
60840 kB (10%) ggc
ipa lto decl merge : 0.20 ( 2%) usr 0.00 ( 0%) sys 0.20 ( 1%) wall
1051 kB ( 0%) ggc
ipa lto cgraph merge : 0.22 ( 2%) usr 0.01 ( 1%) sys 0.25 ( 2%) wall
17676 kB ( 3%) ggc
whopr wpa : 0.38 ( 3%) usr 0.00 ( 0%) sys 0.35 ( 3%) wall
626 kB ( 0%) ggc
whopr wpa I/O : 0.01 ( 0%) usr 0.47 (58%) sys 0.98 ( 7%) wall
0 kB ( 0%) ggc
whopr partitioning : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall
0 kB ( 0%) ggc
ipa reference : 0.31 ( 3%) usr 0.01 ( 1%) sys 0.33 ( 2%) wall
0 kB ( 0%) ggc
ipa profile : 0.09 ( 1%) usr 0.01 ( 1%) sys 0.10 ( 1%) wall
150 kB ( 0%) ggc
ipa pure const : 0.29 ( 2%) usr 0.00 ( 0%) sys 0.30 ( 2%) wall
0 kB ( 0%) ggc
tree SSA incremental : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
203 kB ( 0%) ggc
tree operand scan : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall
3512 kB ( 1%) ggc
dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
varconst : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
unaccounted todo : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 1%) wall
0 kB ( 0%) ggc
TOTAL : 12.08 0.81 13.43
610123 kB
Inliing heuristics was also around 25% w/o your change. Timming maches my
experience with firefox - growth estimation tends to be the hot functions, with
caching, badness is off the radar. As such I think the patch is safe to go.
Thank you!
> >
> > I was never really happy about the double use there and in fact the whole
> > fixed
> > point arithmetic in badness compuation is a mess. If we had template based
> > fibonaci heap and sreal fast enough, turing it all to reals would save quite
> > some maintenance burden.
>
> Yeah, well.
>
> Richard.
>
> > Honza