https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85964
Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2018-05-29 CC| |ebotcazou at gcc dot gnu.org, | |law at gcc dot gnu.org, | |marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> --- Started when we introduced GCC unroll pragma: r255973. Changing 128 to 32, then it takes ~5s on a Haswell machine. time report: time gcc pr85954.c -c -ftracer -fno-guess-branch-probability -O3 -ftime-report Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1247 kB ( 6%) phase opt and generate : 5.42 (100%) 0.02 (100%) 5.44 (100%) 21204 kB ( 94%) phase finalize : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) CFG verifier : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) tree CFG cleanup : 3.40 ( 63%) 0.00 ( 0%) 3.43 ( 63%) 773 kB ( 3%) tree VRP : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 1%) 1605 kB ( 7%) tree copy propagation : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 23 kB ( 0%) tree PTA : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) tree SSA rewrite : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) tree SSA incremental : 0.17 ( 3%) 0.00 ( 0%) 0.16 ( 3%) 2336 kB ( 10%) tree operand scan : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 645 kB ( 3%) dominator optimization : 0.07 ( 1%) 0.01 ( 50%) 0.06 ( 1%) 1834 kB ( 8%) backwards jump threading : 1.49 ( 27%) 0.00 ( 0%) 1.49 ( 27%) 0 kB ( 0%) tree FRE : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2 kB ( 0%) tree loop invariant motion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) complete unrolling : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 2741 kB ( 12%) tree vectorization : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1412 kB ( 6%) tree SSA verifier : 0.03 ( 1%) 0.00 ( 0%) 0.04 ( 1%) 0 kB ( 0%) tree STMT verifier : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) tree strlen optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) dominance frontiers : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) dominance computation : 0.02 ( 0%) 0.01 ( 50%) 0.01 ( 0%) 0 kB ( 0%) loop init : 0.04 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 128 kB ( 1%) CPROP : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 20 kB ( 0%) tracer : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 0%) 7352 kB ( 33%) combiner : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 105 kB ( 0%) tree loop if-conversion : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 663 kB ( 3%) rest of compilation : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 387 kB ( 2%) repair loop structures : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) TOTAL : 5.43 0.02 5.45 22612 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. real 0m5.468s user 0m5.441s sys 0m0.027s Perf report: # Overhead Command Shared Object Symbol # ........ ....... ................ ........................................................................................................................................................................................................... # 31.15% cc1 cc1 [.] et_splay 13.65% cc1 cc1 [.] fsm_find_thread_path 8.80% cc1 cc1 [.] iterate_fix_dominators 4.75% cc1 cc1 [.] hash_table<default_hash_traits<basic_block_def*>, xcallocator>::find_empty_slot_for_expand 3.38% cc1 cc1 [.] thread_jumps::handle_phi 2.78% cc1 cc1 [.] thread_jumps::fsm_find_control_statement_thread_paths 2.70% cc1 cc1 [.] bitmap_set_bit 2.47% cc1 cc1 [.] graphds_dfs 2.33% cc1 cc1 [.] et_root 2.01% cc1 cc1 [.] hash_table<default_hash_traits<basic_block_def*>, xcallocator>::expand 1.81% cc1 cc1 [.] et_below 1.39% cc1 libc-2.27.so [.] _int_malloc 1.35% cc1 cc1 [.] add_edge