https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80838

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-07-05
                 CC|                            |hubicka at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The same slowdown is seen when compiling combine.ii, too, so it is not caused
by tramp3d not being representative.  With LTO:

 Performance counter stats for './xgcc -B ./ /aux/hubicka/combine.ii -O2 -S':

       6078.849031      task-clock (msec)         #    0.998 CPUs utilized
               616      context-switches          #    0.101 K/sec
                 2      cpu-migrations            #    0.000 K/sec
            44,730      page-faults               #    0.007 M/sec
    18,048,640,894      cycles                    #    2.969 GHz
     3,077,757,657      stalled-cycles-frontend   #   17.05% frontend cycles
idle
     5,647,264,874      stalled-cycles-backend    #   31.29% backend  cycles
idle
    14,751,268,913      instructions              #    0.82  insns per cycle
                                                  #    0.38  stalled cycles per
insn
     3,221,027,271      branches                  #  529.875 M/sec
       146,203,951      branch-misses             #    4.54% of all branches

       6.088293716 seconds time elapsed

Without LTO:

 Performance counter stats for './xgcc -B ./ /aux/hubicka/combine.ii -O2 -S':

       5815.257284      task-clock (msec)         #    0.998 CPUs utilized
               594      context-switches          #    0.102 K/sec
                 3      cpu-migrations            #    0.001 K/sec
            44,736      page-faults               #    0.008 M/sec
    17,305,115,662      cycles                    #    2.976 GHz
     2,736,627,618      stalled-cycles-frontend   #   15.81% frontend cycles
idle
     5,580,251,004      stalled-cycles-backend    #   32.25% backend  cycles
idle
    13,952,236,872      instructions              #    0.81  insns per cycle
                                                  #    0.40  stalled cycles per
insn
     3,147,743,704      branches                  #  541.291 M/sec
       145,597,831      branch-misses             #    4.63% of all branches

       5.824996172 seconds time elapsed


Sadly profile seems sufficiently flat to make it hard to work out what is going
on.  With LTO:
Samples: 24K of event 'cycles', Event count (approx.): 18314340157              
   2.67%  cc1plus  cc1plus            [.] bitmap_set_bit
   1.53%  cc1plus  libc-2.19.so       [.] _int_malloc
   1.19%  cc1plus  cc1plus            [.] df_worklist_dataflow
   1.07%  cc1plus  cc1plus            [.] get_ref_base_and_extent
   1.05%  cc1plus  cc1plus            [.] ggc_internal_alloc
   1.04%  cc1plus  cc1plus            [.] bitmap_ior_into
   1.01%  cc1plus  cc1plus            [.] pre_and_rev_post_order_compute_fn
   0.92%  cc1plus  cc1plus            [.] lra_create_live_ranges_1
   0.86%  cc1plus  cc1plus            [.] record_reg_classes
   0.82%  cc1plus  cc1plus            [.] df_note_compute
   0.79%  cc1plus  cc1plus            [.] bitmap_clear_bit
   0.65%  cc1plus  cc1plus            [.] walk_tree_1
   0.63%  cc1plus  libc-2.19.so       [.] _int_free
   0.58%  cc1plus  cc1plus            [.] fsm_find_thread_path
   0.57%  cc1plus  cc1plus            [.] flags_from_decl_or_type
   0.56%  cc1plus  cc1plus            [.] cleanup_cfg
   0.55%  cc1plus  cc1plus            [.] wide_int_to_tree
   0.55%  cc1plus  cc1plus            [.] df_live_bb_local_compute
   0.54%  cc1plus  cc1plus            [.] et_set_father
   0.54%  cc1plus  cc1plus            [.] bitmap_bit_p
   0.53%  cc1plus  cc1plus            [.] bitmap_copy
   0.53%  cc1plus  cc1plus            [.] operand_equal_p
   0.52%  cc1plus  cc1plus            [.] sched_analyze_insn
   0.52%  cc1plus  cc1plus            [.] reload_cse_simplify_operands
   0.51%  cc1plus  cc1plus            [.] (anonymous
namespace)::dom_info::calc_idoms
   0.50%  cc1plus  cc1plus            [.] cse_insn
   0.47%  cc1plus  [kernel.kallsyms]  [k] clear_page

Without LTO:

Samples: 23K of event 'cycles', Event count (approx.): 17586270321              
Overhead  Command  Shared Object      Symbol                                    
   2.88%  cc1plus  cc1plus            [.] bitmap_set_bit
   1.72%  cc1plus  libc-2.19.so       [.] _int_malloc
   1.05%  cc1plus  cc1plus            [.] ggc_internal_alloc
   1.04%  cc1plus  cc1plus            [.] get_ref_base_and_extent
   1.03%  cc1plus  cc1plus            [.] pre_and_rev_post_order_compute_fn
   0.98%  cc1plus  cc1plus            [.] lra_create_live_ranges_1
   0.97%  cc1plus  cc1plus            [.] bitmap_ior_into
   0.88%  cc1plus  cc1plus            [.] record_operand_costs
   0.82%  cc1plus  cc1plus            [.] bitmap_clear_bit
   0.80%  cc1plus  cc1plus            [.] df_note_compute
   0.79%  cc1plus  cc1plus            [.] df_worklist_dataflow
   0.72%  cc1plus  libc-2.19.so       [.] _int_free
   0.71%  cc1plus  cc1plus            [.] (anonymous
namespace)::dom_info::calc_idoms
   0.70%  cc1plus  cc1plus            [.] run_fast_df_dce
   0.65%  cc1plus  cc1plus            [.] bitmap_bit_p
   0.63%  cc1plus  cc1plus            [.] bitmap_copy
   0.62%  cc1plus  cc1plus            [.] walk_tree_1
   0.58%  cc1plus  cc1plus            [.] et_set_father
   0.55%  cc1plus  [kernel.kallsyms]  [k] clear_page
   0.53%  cc1plus  cc1plus            [.] df_live_local_compute
   0.51%  cc1plus  cc1plus            [.] flags_from_decl_or_type
   0.49%  cc1plus  cc1plus            [.] et_below
   0.47%  cc1plus  cc1plus            [.] df_analyze
   0.46%  cc1plus  cc1plus            [.] cse_insn
   0.46%  cc1plus  cc1plus            [.] cleanup_cfg
   0.45%  cc1plus  cc1plus            [.] inverted_post_order_compute
   0.45%  cc1plus  cc1plus            [.] constrain_operands
   0.44%  cc1plus  cc1plus            [.] reload_cse_simplify_operands
   0.43%  cc1plus  cc1plus            [.] fsm_find_thread_path
   0.43%  cc1plus  libc-2.19.so       [.] memset



Measuring instructions is slightly more precise.

   3.02%  cc1plus  cc1plus            [.] bitmap_set_bit
   1.33%  cc1plus  cc1plus            [.] get_ref_base_and_extent
   1.18%  cc1plus  libc-2.19.so       [.] _int_malloc
   1.06%  cc1plus  cc1plus            [.] record_reg_classes
   1.06%  cc1plus  cc1plus            [.] bitmap_ior_into
   1.03%  cc1plus  cc1plus            [.] bitmap_clear_bit
   1.02%  cc1plus  cc1plus            [.] df_worklist_dataflow
   0.94%  cc1plus  cc1plus            [.] df_note_compute
   0.94%  cc1plus  cc1plus            [.] df_insn_refs_collect
   0.88%  cc1plus  cc1plus            [.] ggc_internal_alloc
   0.86%  cc1plus  cc1plus            [.] lra_create_live_ranges_1
   0.82%  cc1plus  cc1plus            [.] pre_and_rev_post_order_compute_fn
   0.77%  cc1plus  cc1plus            [.] operand_equal_p
   0.71%  cc1plus  libc-2.19.so       [.] _int_free
   0.68%  cc1plus  cc1plus            [.] inchash::add_expr
   0.68%  cc1plus  cc1plus            [.] bitmap_bit_p
   0.68%  cc1plus  cc1plus            [.] walk_tree_1
   0.68%  cc1plus  cc1plus            [.] fsm_find_thread_path
   0.66%  cc1plus  cc1plus            [.] reload_cse_simplify_operands
   0.65%  cc1plus  cc1plus            [.] sched_analyze_insn
   0.58%  cc1plus  cc1plus            [.] (anonymous
namespace)::dom_info::calc_idoms
   0.55%  cc1plus  cc1plus            [.] cleanup_cfg
   0.53%  cc1plus  cc1plus            [.] et_set_father
   0.51%  cc1plus  cc1plus            [.] constrain_operands
   0.50%  cc1plus  cc1plus            [.] cse_insn

non-lto:

   3.25%  cc1plus  cc1plus            [.] bitmap_set_bit
   1.34%  cc1plus  cc1plus            [.] get_ref_base_and_extent
   1.26%  cc1plus  libc-2.19.so       [.] _int_malloc
   1.19%  cc1plus  cc1plus            [.] record_operand_costs
   1.02%  cc1plus  cc1plus            [.] bitmap_ior_into
   1.00%  cc1plus  cc1plus            [.] lra_create_live_ranges_1
   1.00%  cc1plus  cc1plus            [.] bitmap_clear_bit
   0.99%  cc1plus  cc1plus            [.] df_note_compute
   0.94%  cc1plus  cc1plus            [.] pre_and_rev_post_order_compute_fn
   0.91%  cc1plus  cc1plus            [.] ggc_internal_alloc
   0.84%  cc1plus  cc1plus            [.] df_insn_refs_collect
   0.81%  cc1plus  cc1plus            [.] inchash::add_expr
   0.76%  cc1plus  cc1plus            [.] bitmap_bit_p
   0.76%  cc1plus  cc1plus            [.] operand_equal_p
   0.75%  cc1plus  cc1plus            [.] sched_analyze_insn
   0.72%  cc1plus  cc1plus            [.] reload_cse_simplify_operands
   0.64%  cc1plus  cc1plus            [.] (anonymous
namespace)::dom_info::calc_idoms
   0.64%  cc1plus  cc1plus            [.] walk_tree_1
   0.63%  cc1plus  cc1plus            [.] et_set_father
   0.60%  cc1plus  cc1plus            [.] run_fast_df_dce
   0.59%  cc1plus  cc1plus            [.] df_worklist_dataflow
   0.57%  cc1plus  libc-2.19.so       [.] _int_free
   0.57%  cc1plus  cc1plus            [.] _cpp_lex_token
   0.55%  cc1plus  cc1plus            [.] cse_insn
   0.55%  cc1plus  cc1plus            [.] cleanup_tree_cfg
   0.54%  cc1plus  cc1plus            [.] find_costs_and_classes
   0.51%  cc1plus  cc1plus            [.] constrain_operands

Reply via email to