https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80838
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2017-07-05 CC| |hubicka at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> --- The same slowdown is seen when compiling combine.ii, too, so it is not caused by tramp3d not being representative. With LTO: Performance counter stats for './xgcc -B ./ /aux/hubicka/combine.ii -O2 -S': 6078.849031 task-clock (msec) # 0.998 CPUs utilized 616 context-switches # 0.101 K/sec 2 cpu-migrations # 0.000 K/sec 44,730 page-faults # 0.007 M/sec 18,048,640,894 cycles # 2.969 GHz 3,077,757,657 stalled-cycles-frontend # 17.05% frontend cycles idle 5,647,264,874 stalled-cycles-backend # 31.29% backend cycles idle 14,751,268,913 instructions # 0.82 insns per cycle # 0.38 stalled cycles per insn 3,221,027,271 branches # 529.875 M/sec 146,203,951 branch-misses # 4.54% of all branches 6.088293716 seconds time elapsed Without LTO: Performance counter stats for './xgcc -B ./ /aux/hubicka/combine.ii -O2 -S': 5815.257284 task-clock (msec) # 0.998 CPUs utilized 594 context-switches # 0.102 K/sec 3 cpu-migrations # 0.001 K/sec 44,736 page-faults # 0.008 M/sec 17,305,115,662 cycles # 2.976 GHz 2,736,627,618 stalled-cycles-frontend # 15.81% frontend cycles idle 5,580,251,004 stalled-cycles-backend # 32.25% backend cycles idle 13,952,236,872 instructions # 0.81 insns per cycle # 0.40 stalled cycles per insn 3,147,743,704 branches # 541.291 M/sec 145,597,831 branch-misses # 4.63% of all branches 5.824996172 seconds time elapsed Sadly profile seems sufficiently flat to make it hard to work out what is going on. With LTO: Samples: 24K of event 'cycles', Event count (approx.): 18314340157 2.67% cc1plus cc1plus [.] bitmap_set_bit 1.53% cc1plus libc-2.19.so [.] _int_malloc 1.19% cc1plus cc1plus [.] df_worklist_dataflow 1.07% cc1plus cc1plus [.] get_ref_base_and_extent 1.05% cc1plus cc1plus [.] ggc_internal_alloc 1.04% cc1plus cc1plus [.] bitmap_ior_into 1.01% cc1plus cc1plus [.] pre_and_rev_post_order_compute_fn 0.92% cc1plus cc1plus [.] lra_create_live_ranges_1 0.86% cc1plus cc1plus [.] record_reg_classes 0.82% cc1plus cc1plus [.] df_note_compute 0.79% cc1plus cc1plus [.] bitmap_clear_bit 0.65% cc1plus cc1plus [.] walk_tree_1 0.63% cc1plus libc-2.19.so [.] _int_free 0.58% cc1plus cc1plus [.] fsm_find_thread_path 0.57% cc1plus cc1plus [.] flags_from_decl_or_type 0.56% cc1plus cc1plus [.] cleanup_cfg 0.55% cc1plus cc1plus [.] wide_int_to_tree 0.55% cc1plus cc1plus [.] df_live_bb_local_compute 0.54% cc1plus cc1plus [.] et_set_father 0.54% cc1plus cc1plus [.] bitmap_bit_p 0.53% cc1plus cc1plus [.] bitmap_copy 0.53% cc1plus cc1plus [.] operand_equal_p 0.52% cc1plus cc1plus [.] sched_analyze_insn 0.52% cc1plus cc1plus [.] reload_cse_simplify_operands 0.51% cc1plus cc1plus [.] (anonymous namespace)::dom_info::calc_idoms 0.50% cc1plus cc1plus [.] cse_insn 0.47% cc1plus [kernel.kallsyms] [k] clear_page Without LTO: Samples: 23K of event 'cycles', Event count (approx.): 17586270321 Overhead Command Shared Object Symbol 2.88% cc1plus cc1plus [.] bitmap_set_bit 1.72% cc1plus libc-2.19.so [.] _int_malloc 1.05% cc1plus cc1plus [.] ggc_internal_alloc 1.04% cc1plus cc1plus [.] get_ref_base_and_extent 1.03% cc1plus cc1plus [.] pre_and_rev_post_order_compute_fn 0.98% cc1plus cc1plus [.] lra_create_live_ranges_1 0.97% cc1plus cc1plus [.] bitmap_ior_into 0.88% cc1plus cc1plus [.] record_operand_costs 0.82% cc1plus cc1plus [.] bitmap_clear_bit 0.80% cc1plus cc1plus [.] df_note_compute 0.79% cc1plus cc1plus [.] df_worklist_dataflow 0.72% cc1plus libc-2.19.so [.] _int_free 0.71% cc1plus cc1plus [.] (anonymous namespace)::dom_info::calc_idoms 0.70% cc1plus cc1plus [.] run_fast_df_dce 0.65% cc1plus cc1plus [.] bitmap_bit_p 0.63% cc1plus cc1plus [.] bitmap_copy 0.62% cc1plus cc1plus [.] walk_tree_1 0.58% cc1plus cc1plus [.] et_set_father 0.55% cc1plus [kernel.kallsyms] [k] clear_page 0.53% cc1plus cc1plus [.] df_live_local_compute 0.51% cc1plus cc1plus [.] flags_from_decl_or_type 0.49% cc1plus cc1plus [.] et_below 0.47% cc1plus cc1plus [.] df_analyze 0.46% cc1plus cc1plus [.] cse_insn 0.46% cc1plus cc1plus [.] cleanup_cfg 0.45% cc1plus cc1plus [.] inverted_post_order_compute 0.45% cc1plus cc1plus [.] constrain_operands 0.44% cc1plus cc1plus [.] reload_cse_simplify_operands 0.43% cc1plus cc1plus [.] fsm_find_thread_path 0.43% cc1plus libc-2.19.so [.] memset Measuring instructions is slightly more precise. 3.02% cc1plus cc1plus [.] bitmap_set_bit 1.33% cc1plus cc1plus [.] get_ref_base_and_extent 1.18% cc1plus libc-2.19.so [.] _int_malloc 1.06% cc1plus cc1plus [.] record_reg_classes 1.06% cc1plus cc1plus [.] bitmap_ior_into 1.03% cc1plus cc1plus [.] bitmap_clear_bit 1.02% cc1plus cc1plus [.] df_worklist_dataflow 0.94% cc1plus cc1plus [.] df_note_compute 0.94% cc1plus cc1plus [.] df_insn_refs_collect 0.88% cc1plus cc1plus [.] ggc_internal_alloc 0.86% cc1plus cc1plus [.] lra_create_live_ranges_1 0.82% cc1plus cc1plus [.] pre_and_rev_post_order_compute_fn 0.77% cc1plus cc1plus [.] operand_equal_p 0.71% cc1plus libc-2.19.so [.] _int_free 0.68% cc1plus cc1plus [.] inchash::add_expr 0.68% cc1plus cc1plus [.] bitmap_bit_p 0.68% cc1plus cc1plus [.] walk_tree_1 0.68% cc1plus cc1plus [.] fsm_find_thread_path 0.66% cc1plus cc1plus [.] reload_cse_simplify_operands 0.65% cc1plus cc1plus [.] sched_analyze_insn 0.58% cc1plus cc1plus [.] (anonymous namespace)::dom_info::calc_idoms 0.55% cc1plus cc1plus [.] cleanup_cfg 0.53% cc1plus cc1plus [.] et_set_father 0.51% cc1plus cc1plus [.] constrain_operands 0.50% cc1plus cc1plus [.] cse_insn non-lto: 3.25% cc1plus cc1plus [.] bitmap_set_bit 1.34% cc1plus cc1plus [.] get_ref_base_and_extent 1.26% cc1plus libc-2.19.so [.] _int_malloc 1.19% cc1plus cc1plus [.] record_operand_costs 1.02% cc1plus cc1plus [.] bitmap_ior_into 1.00% cc1plus cc1plus [.] lra_create_live_ranges_1 1.00% cc1plus cc1plus [.] bitmap_clear_bit 0.99% cc1plus cc1plus [.] df_note_compute 0.94% cc1plus cc1plus [.] pre_and_rev_post_order_compute_fn 0.91% cc1plus cc1plus [.] ggc_internal_alloc 0.84% cc1plus cc1plus [.] df_insn_refs_collect 0.81% cc1plus cc1plus [.] inchash::add_expr 0.76% cc1plus cc1plus [.] bitmap_bit_p 0.76% cc1plus cc1plus [.] operand_equal_p 0.75% cc1plus cc1plus [.] sched_analyze_insn 0.72% cc1plus cc1plus [.] reload_cse_simplify_operands 0.64% cc1plus cc1plus [.] (anonymous namespace)::dom_info::calc_idoms 0.64% cc1plus cc1plus [.] walk_tree_1 0.63% cc1plus cc1plus [.] et_set_father 0.60% cc1plus cc1plus [.] run_fast_df_dce 0.59% cc1plus cc1plus [.] df_worklist_dataflow 0.57% cc1plus libc-2.19.so [.] _int_free 0.57% cc1plus cc1plus [.] _cpp_lex_token 0.55% cc1plus cc1plus [.] cse_insn 0.55% cc1plus cc1plus [.] cleanup_tree_cfg 0.54% cc1plus cc1plus [.] find_costs_and_classes 0.51% cc1plus cc1plus [.] constrain_operands