Re: [PATCH] Change the badness computation to ensure no integer-underflow

Jan Hubicka Wed, 28 Aug 2013 10:11:39 -0700

> > I am giving the patch brief benchmarking on profiledbootstrap and it it 
> > won't
> > cause major regression, I think we should go ahead with the patch.


Uhm, I profiledbootstrapped and we bit too fast to get resonable oprofile.  
What I get is:
7443      9.4372  lto1                     lto1                     
lto_end_uncompression(lto_compression_stream*)
4438      5.6271  lto1                     lto1                     
_ZL14DFS_write_treeP12output_blockP4sccsP9tree_nodebb.lto_priv.4993
2351      2.9809  lto1                     lto1                     
lto_output_tree(output_block*, tree_node*, bool, bool)
2179      2.7628  lto1                     lto1                     
_ZL30linemap_macro_loc_to_exp_pointP9line_mapsjPPK8line_map.lto_priv.7860
1910      2.4217  lto1                     lto1                     
_ZL19unpack_value_fieldsP7data_inP9bitpack_dP9tree_node.lto_priv.7292
1855      2.3520  libc-2.11.1.so           libc-2.11.1.so           
msort_with_tmp
1531      1.9412  lto1                     lto1                     
streamer_string_index(output_block*, char const*, unsigned int, bool)
1530      1.9399  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
1471      1.8651  lto1                     lto1                     
do_estimate_growth(cgraph_node*)
1306      1.6559  lto1                     lto1                     
pointer_map_insert(pointer_map_t*, void const*)
1238      1.5697  lto1                     lto1                     
_Z28streamer_pack_tree_bitfieldsP12output_blockP9bitpack_dP9tree_node.constprop.1086
1138      1.4429  lto1                     lto1                     
compare_tree_sccs_1(tree_node*, tree_node*, tree_node***)
1082      1.3719  lto1                     lto1                     
streamer_write_tree_body(output_block*, tree_node*, bool)
1044      1.3237  lto1                     lto1                     
_ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_j3vecIP9tree_node7va_heap6vl_ptrES7_S2_IP21ipa_agg_jump_function

We take 12 seconds of WPA on GCC (with my fork patch)
Execution times (seconds)
 phase setup             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :   4.48 (37%) usr   0.05 ( 6%) sys   4.57 (34%) wall  
 42983 kB ( 7%) ggc
 phase stream in         :   7.21 (60%) usr   0.26 (32%) sys   7.47 (56%) wall  
565102 kB (93%) ggc
 phase stream out        :   0.38 ( 3%) usr   0.50 (62%) sys   1.37 (10%) wall  
   623 kB ( 0%) ggc
 callgraph optimization  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
     6 kB ( 0%) ggc
 ipa dead code removal   :   0.46 ( 4%) usr   0.00 ( 0%) sys   0.46 ( 3%) wall  
     0 kB ( 0%) ggc
 ipa cp                  :   0.36 ( 3%) usr   0.01 ( 1%) sys   0.41 ( 3%) wall  
 38261 kB ( 6%) ggc
 ipa inlining heuristics :   2.84 (24%) usr   0.05 ( 6%) sys   2.87 (21%) wall  
 60263 kB (10%) ggc
 ipa lto gimple in       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
     0 kB ( 0%) ggc
 ipa lto gimple out      :   0.04 ( 0%) usr   0.02 ( 2%) sys   0.06 ( 0%) wall  
     0 kB ( 0%) ggc
 ipa lto decl in         :   6.23 (52%) usr   0.18 (22%) sys   6.40 (48%) wall  
425731 kB (70%) ggc
 ipa lto decl out        :   0.09 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall  
     0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   0.22 ( 2%) usr   0.02 ( 2%) sys   0.25 ( 2%) wall  
 60840 kB (10%) ggc
 ipa lto decl merge      :   0.20 ( 2%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall  
  1051 kB ( 0%) ggc
 ipa lto cgraph merge    :   0.22 ( 2%) usr   0.01 ( 1%) sys   0.25 ( 2%) wall  
 17676 kB ( 3%) ggc
 whopr wpa               :   0.38 ( 3%) usr   0.00 ( 0%) sys   0.35 ( 3%) wall  
   626 kB ( 0%) ggc
 whopr wpa I/O           :   0.01 ( 0%) usr   0.47 (58%) sys   0.98 ( 7%) wall  
     0 kB ( 0%) ggc
 whopr partitioning      :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall  
     0 kB ( 0%) ggc
 ipa reference           :   0.31 ( 3%) usr   0.01 ( 1%) sys   0.33 ( 2%) wall  
     0 kB ( 0%) ggc
 ipa profile             :   0.09 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall  
   150 kB ( 0%) ggc
 ipa pure const          :   0.29 ( 2%) usr   0.00 ( 0%) sys   0.30 ( 2%) wall  
     0 kB ( 0%) ggc
 tree SSA incremental    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
   203 kB ( 0%) ggc
 tree operand scan       :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall  
  3512 kB ( 1%) ggc
 dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
     0 kB ( 0%) ggc
 varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
     0 kB ( 0%) ggc
 unaccounted todo        :   0.06 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 1%) wall  
     0 kB ( 0%) ggc
 TOTAL                 :  12.08             0.81            13.43             
610123 kB

Inliing heuristics was also around 25% w/o your change.  Timming maches my
experience with firefox - growth estimation tends to be the hot functions, with
caching, badness is off the radar.  As such I think the patch is safe to go.
Thank you!


> >
> > I was never really happy about the double use there and in fact the whole 
> > fixed
> > point arithmetic in badness compuation is a mess.  If we had template based
> > fibonaci heap and sreal fast enough, turing it all to reals would save quite
> > some maintenance burden.
> 
> Yeah, well.
> 
> Richard.
> 
> > Honza

Re: [PATCH] Change the badness computation to ensure no integer-underflow

Reply via email to