https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85964

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-05-29
                 CC|                            |ebotcazou at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |marxin at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Started when we introduced GCC unroll pragma: r255973. Changing 128 to 32, then
it takes ~5s on a Haswell machine.

time report:
time gcc pr85954.c -c -ftracer -fno-guess-branch-probability -O3 -ftime-report

Time variable                                   usr           sys          wall
              GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   1247 kB (  6%)
 phase opt and generate             :   5.42 (100%)   0.02 (100%)   5.44 (100%)
  21204 kB ( 94%)
 phase finalize                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 CFG verifier                       :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 tree CFG cleanup                   :   3.40 ( 63%)   0.00 (  0%)   3.43 ( 63%)
    773 kB (  3%)
 tree VRP                           :   0.01 (  0%)   0.00 (  0%)   0.03 (  1%)
   1605 kB (  7%)
 tree copy propagation              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
     23 kB (  0%)
 tree PTA                           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 tree SSA rewrite                   :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 tree SSA incremental               :   0.17 (  3%)   0.00 (  0%)   0.16 (  3%)
   2336 kB ( 10%)
 tree operand scan                  :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
    645 kB (  3%)
 dominator optimization             :   0.07 (  1%)   0.01 ( 50%)   0.06 (  1%)
   1834 kB (  8%)
 backwards jump threading           :   1.49 ( 27%)   0.00 (  0%)   1.49 ( 27%)
      0 kB (  0%)
 tree FRE                           :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      2 kB (  0%)
 tree loop invariant motion         :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 complete unrolling                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
   2741 kB ( 12%)
 tree vectorization                 :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   1412 kB (  6%)
 tree SSA verifier                  :   0.03 (  1%)   0.00 (  0%)   0.04 (  1%)
      0 kB (  0%)
 tree STMT verifier                 :   0.03 (  1%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 tree strlen optimization           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 dominance frontiers                :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 dominance computation              :   0.02 (  0%)   0.01 ( 50%)   0.01 (  0%)
      0 kB (  0%)
 loop init                          :   0.04 (  1%)   0.00 (  0%)   0.02 (  0%)
    128 kB (  1%)
 CPROP                              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
     20 kB (  0%)
 tracer                             :   0.03 (  1%)   0.00 (  0%)   0.02 (  0%)
   7352 kB ( 33%)
 combiner                           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    105 kB (  0%)
 tree loop if-conversion            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    663 kB (  3%)
 rest of compilation                :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    387 kB (  2%)
 repair loop structures             :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 TOTAL                              :   5.43          0.02          5.45       
  22612 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    0m5.468s
user    0m5.441s
sys     0m0.027s

Perf report:
# Overhead  Command  Shared Object     Symbol                                   
# ........  .......  ................ 
...........................................................................................................................................................................................................
#
    31.15%  cc1      cc1               [.] et_splay
    13.65%  cc1      cc1               [.] fsm_find_thread_path
     8.80%  cc1      cc1               [.] iterate_fix_dominators
     4.75%  cc1      cc1               [.]
hash_table<default_hash_traits<basic_block_def*>,
xcallocator>::find_empty_slot_for_expand
     3.38%  cc1      cc1               [.] thread_jumps::handle_phi
     2.78%  cc1      cc1               [.]
thread_jumps::fsm_find_control_statement_thread_paths
     2.70%  cc1      cc1               [.] bitmap_set_bit
     2.47%  cc1      cc1               [.] graphds_dfs
     2.33%  cc1      cc1               [.] et_root
     2.01%  cc1      cc1               [.]
hash_table<default_hash_traits<basic_block_def*>, xcallocator>::expand
     1.81%  cc1      cc1               [.] et_below
     1.39%  cc1      libc-2.27.so      [.] _int_malloc
     1.35%  cc1      cc1               [.] add_edge

Reply via email to