https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- At -O1 we have Samples: 2M of event 'cycles:u', Event count (approx.): 2983686432518 Overhead Samples Command Shared Object Symbol 19.77% 467950 cc1 cc1 [.] bitmap_bit_p 12.31% 300919 cc1 cc1 [.] wide_int_storage::operator= 6.79% 158610 cc1 cc1 [.] gori_compute::may_recompute_p 4.84% 113100 cc1 cc1 [.] ranger_cache::range_from_dom 3.79% 88582 cc1 cc1 [.] bitmap_set_bit 3.24% 75772 cc1 cc1 [.] block_range_cache::get_bb_range 2.40% 56058 cc1 cc1 [.] get_immediate_dominator 2.37% 55493 cc1 cc1 [.] gori_map::exports 2.15% 50244 cc1 cc1 [.] gori_map::is_export_p 1.87% 45710 cc1 cc1 [.] wide_int_storage::wide_int_storage 1.73% 40436 cc1 cc1 [.] infer_range_manager::has_range_p 1.70% 39586 cc1 cc1 [.] gimple_has_side_effects 1.17% 28642 cc1 cc1 [.] irange_storage::get_irange 1.13% 27004 cc1 cc1 [.] back_jt_path_registry::adjust_paths_after_duplication so it's DOMs jump threader that takes the time. Using -O1 -fno-thread-jumps this improves a lot to Samples: 362K of event 'cycles:u', Event count (approx.): 441041461405 Overhead Samples Command Shared Object Symbol 22.44% 78191 cc1 cc1 [.] wide_int_storage::operator= 11.02% 38451 cc1 cc1 [.] bitmap_bit_p 3.55% 12318 cc1 cc1 [.] dom_oracle::register_transitives 3.45% 12016 cc1 cc1 [.] wide_int_storage::wide_int_storage I'm going to try to collect a callgrind profile for -O1.