https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
At -O1 we have

Samples: 2M of event 'cycles:u', Event count (approx.): 2983686432518           
Overhead       Samples  Command  Shared Object     Symbol                       
  19.77%        467950  cc1      cc1               [.] bitmap_bit_p             
  12.31%        300919  cc1      cc1               [.]
wide_int_storage::operator=                                           
   6.79%        158610  cc1      cc1               [.]
gori_compute::may_recompute_p                                         
   4.84%        113100  cc1      cc1               [.]
ranger_cache::range_from_dom                                          
   3.79%         88582  cc1      cc1               [.] bitmap_set_bit           
   3.24%         75772  cc1      cc1               [.]
block_range_cache::get_bb_range                                       
   2.40%         56058  cc1      cc1               [.] get_immediate_dominator  
   2.37%         55493  cc1      cc1               [.] gori_map::exports        
   2.15%         50244  cc1      cc1               [.] gori_map::is_export_p    
   1.87%         45710  cc1      cc1               [.]
wide_int_storage::wide_int_storage                                    
   1.73%         40436  cc1      cc1               [.]
infer_range_manager::has_range_p                                      
   1.70%         39586  cc1      cc1               [.] gimple_has_side_effects  
   1.17%         28642  cc1      cc1               [.]
irange_storage::get_irange                                            
   1.13%         27004  cc1      cc1               [.]
back_jt_path_registry::adjust_paths_after_duplication  

so it's DOMs jump threader that takes the time.  Using -O1 -fno-thread-jumps
this improves a lot to

Samples: 362K of event 'cycles:u', Event count (approx.): 441041461405          
Overhead       Samples  Command  Shared Object     Symbol                       
  22.44%         78191  cc1      cc1               [.]
wide_int_storage::operator=                                           
  11.02%         38451  cc1      cc1               [.] bitmap_bit_p             
   3.55%         12318  cc1      cc1               [.]
dom_oracle::register_transitives                                      
   3.45%         12016  cc1      cc1               [.]
wide_int_storage::wide_int_storage                      

I'm going to try to collect a callgrind profile for -O1.

Reply via email to