https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482

            Bug ID: 119482
           Summary: slow compilation on
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ak at gcc dot gnu.org
  Target Milestone: ---

Created attachment 60892
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60892&action=edit
input file

This is a file from the Ladybird browser. It uses flatten. With flatten gcc
compilation is a lot slower (40+s) vs clang (6s). The ladybird developers had
to disable it to not make the CI time out.

It doesn't look like a problem with the inliner, but the file just hitting
general scaling limits. The profile is still fairly flat, but the top hot
functions seem to be ranger and SSA related.

time g++-15 -ftime-report -std=gnu++20 -O2 interpreter.i  -S -w

Time variable                                  wall           GGC
 phase setup                        :   0.00 (  0%)  1952k (  0%)
 phase parsing                      :   0.73 (  2%)   237M ( 25%)
 phase lang. deferred               :   0.28 (  1%)    57M (  6%)
 phase opt and generate             :  41.67 ( 98%)   651M ( 69%)
 |name lookup                       :   0.12 (  0%)    11M (  1%)
 |overload resolution               :   0.29 (  1%)    64M (  7%)
 garbage collection                 :   0.39 (  1%)     0  (  0%)
 dump files                         :   0.02 (  0%)     0  (  0%)
 callgraph construction             :   0.11 (  0%)    21M (  2%)
 callgraph optimization             :   0.18 (  0%)    52k (  0%)
 callgraph functions expansion      :  36.11 ( 85%)   369M ( 39%)
 callgraph ipa passes               :   5.23 ( 12%)   234M ( 25%)
 ipa function summary               :   0.08 (  0%)  3583k (  0%)
 ipa dead code removal              :   0.02 (  0%)     0  (  0%)
 ipa cp                             :   0.11 (  0%)  3444k (  0%)
 ipa inlining heuristics            :   0.19 (  0%)    13M (  1%)
 ipa function splitting             :   0.02 (  0%)   842k (  0%)
 ipa reference                      :   0.01 (  0%)     0  (  0%)
 ipa pure const                     :   0.04 (  0%)    32k (  0%)
 ipa icf                            :   0.04 (  0%)  2176  (  0%)
 ipa SRA                            :   0.04 (  0%)   738k (  0%)
 ipa modref                         :   0.03 (  0%)   793k (  0%)
 cfg construction                   :   0.02 (  0%)  2180k (  0%)
 cfg cleanup                        :   0.78 (  2%)  5539k (  1%)
 trivially dead code                :   0.12 (  0%)     0  (  0%)
 df scan insns                      :   0.11 (  0%)    20k (  0%)
 df reaching defs                   :   1.37 (  3%)     0  (  0%)
 df live regs                       :   3.91 (  9%)     0  (  0%)
 df live&initialized regs           :   4.46 ( 10%)     0  (  0%)
 df must-initialized regs           :   0.02 (  0%)     0  (  0%)
 df use-def / def-use chains        :   0.20 (  0%)     0  (  0%)
 df live reg subwords               :   0.04 (  0%)     0  (  0%)
 df reg dead/unused notes           :   0.61 (  1%)  5459k (  1%)
 register information               :   0.20 (  0%)     0  (  0%)
 alias analysis                     :   0.26 (  1%)    10M (  1%)
 alias stmt walking                 :   2.70 (  6%)  2275k (  0%)
 register scan                      :   0.03 (  0%)   107k (  0%)
 rebuild jump labels                :   0.07 (  0%)     0  (  0%)
 preprocessing                      :   0.03 (  0%)  1942k (  0%)
 parser (global)                    :   0.07 (  0%)    51M (  5%)
 parser struct body                 :   0.10 (  0%)    31M (  3%)
 parser function body               :   0.07 (  0%)    11M (  1%)
 parser inl. func. body             :   0.04 (  0%)  7108k (  1%)
 parser inl. meth. body             :   0.14 (  0%)    34M (  4%)
 template instantiation             :   0.52 (  1%)   150M ( 16%)
 constant expression evaluation     :   0.03 (  0%)  3423k (  0%)
 constraint satisfaction            :   0.02 (  0%)  2596k (  0%)
 early inlining heuristics          :   0.06 (  0%)  9725k (  1%)
 inline parameters                  :   0.12 (  0%)  6552k (  1%)
 integration                        :   0.76 (  2%)   178M ( 19%)
 tree gimplify                      :   0.08 (  0%)    15M (  2%)
 tree eh                            :   0.07 (  0%)  6680k (  1%)
 tree CFG construction              :   0.02 (  0%)  8370k (  1%)
 tree CFG cleanup                   :   1.04 (  2%)   600k (  0%)
 tree tail merge                    :   0.07 (  0%)  2549k (  0%)
 tree VRP                           :   0.57 (  1%)  5979k (  1%)
 tree Early VRP                     :   0.34 (  1%)  7126k (  1%)
 tree copy propagation              :   0.20 (  0%)   111k (  0%)
 tree PTA                           :   1.73 (  4%)  5483k (  1%)
 tree SSA rewrite                   :   0.03 (  0%)  4929k (  1%)
 tree SSA incremental               :   0.88 (  2%)    17M (  2%)
 tree operand scan                  :   0.12 (  0%)    24M (  3%)
 dominator optimization             :   0.87 (  2%)    18M (  2%)
 backwards jump threading           :   0.53 (  1%)  4951k (  1%)
 tree SRA                           :   0.21 (  1%)    11M (  1%)
 isolate eroneous paths             :   0.02 (  0%)  2352  (  0%)
 tree CCP                           :   0.51 (  1%)  1631k (  0%)
 tree split crit edges              :   0.01 (  0%)  3009k (  0%)
 tree reassociation                 :   0.07 (  0%)   168k (  0%)
 tree PRE                           :   0.48 (  1%)    12M (  1%)
 tree FRE                           :   0.92 (  2%)  6295k (  1%)
 tree RPO VN                        :   0.07 (  0%)   330k (  0%)
 tree code sinking                  :   0.10 (  0%)  6166k (  1%)
 tree linearize phis                :   0.08 (  0%)  1636k (  0%)
 tree backward propagate            :   0.01 (  0%)     0  (  0%)
 tree forward propagate             :   0.27 (  1%)  2431k (  0%)
 tree phiprop                       :   0.01 (  0%)     0  (  0%)
 tree conservative DCE              :   0.18 (  0%)   288k (  0%)
 tree aggressive DCE                :   0.15 (  0%)  4133k (  0%)
 tree DSE                           :   0.30 (  1%)  4487k (  0%)
 PHI merge                          :   0.01 (  0%)   479k (  0%)
 tree loop invariant motion         :   0.10 (  0%)    76k (  0%)
 tree canonical iv                  :   0.01 (  0%)   361k (  0%)
 complete unrolling                 :   0.09 (  0%)   974k (  0%)
 tree slp vectorization             :   0.20 (  0%)    28M (  3%)
 tree loop distribution             :   0.01 (  0%)    45k (  0%)
 tree iv optimization               :   0.04 (  0%)  1761k (  0%)
 tree copy headers                  :   0.09 (  0%)   922k (  0%)
 tree SSA uncprop                   :   0.03 (  0%)     0  (  0%)
 gimple widening/fma detection      :   0.02 (  0%)   278k (  0%)
 tree strlen optimization           :   0.06 (  0%)   554k (  0%)
 tree modref                        :   0.06 (  0%)  2558k (  0%)
 dominance frontiers                :   0.09 (  0%)     0  (  0%)
 dominance computation              :   0.92 (  2%)     0  (  0%)
 control dependences                :   0.01 (  0%)     0  (  0%)
 out of ssa                         :   0.07 (  0%)   181k (  0%)
 expand vars                        :   0.23 (  1%)  7554k (  1%)
 expand                             :   0.27 (  1%)    46M (  5%)
 post expand cleanups               :   0.05 (  0%)  2218k (  0%)
 lower subreg                       :   0.04 (  0%)  1607k (  0%)
 forward prop                       :   0.36 (  1%)   682k (  0%)
 CSE                                :   0.54 (  1%)  2209k (  0%)
 dead code elimination              :   0.12 (  0%)     0  (  0%)
 dead store elim1                   :   0.21 (  0%)  3492k (  0%)
 dead store elim2                   :   0.17 (  0%)  4180k (  0%)
 loop init                          :   0.60 (  1%)    11M (  1%)
 loop invariant motion              :   0.02 (  0%)  7656  (  0%)
 loop unrolling                     :   0.06 (  0%)   214k (  0%)
 loop fini                          :   0.02 (  0%)   131k (  0%)
 CPROP                              :   0.17 (  0%)  3932k (  0%)
 PRE                                :   0.08 (  0%)   417k (  0%)
 CSE 2                              :   0.39 (  1%)  1973k (  0%)
 branch prediction                  :   0.10 (  0%)  1896k (  0%)
 combiner                           :   0.66 (  2%)    15M (  2%)
 late combiner                      :   0.35 (  1%)   749k (  0%)
 if-conversion                      :   0.10 (  0%)  1905k (  0%)
 integrated RA                      :   1.63 (  4%)    30M (  3%)
 LRA non-specific                   :   0.53 (  1%)  2955k (  0%)
 LRA virtuals elimination           :   0.08 (  0%)  2964k (  0%)
 LRA reload inheritance             :   0.02 (  0%)   482k (  0%)
 LRA create live ranges             :   1.91 (  4%)   369k (  0%)
 LRA hard reg assignment            :   0.03 (  0%)     0  (  0%)
 LRA rematerialization              :   0.18 (  0%)   736  (  0%)
 reload CSE regs                    :   0.45 (  1%)  6837k (  1%)
 ree                                :   0.05 (  0%)    48k (  0%)
 thread pro- & epilogue             :   0.12 (  0%)  1256k (  0%)
 if-conversion 2                    :   0.04 (  0%)    23k (  0%)
 combine stack adjustments          :   0.03 (  0%)     0  (  0%)
 peephole 2                         :   0.07 (  0%)   746k (  0%)
 hard reg cprop                     :   0.12 (  0%)    34k (  0%)
 scheduling 2                       :   1.00 (  2%)  2445k (  0%)
 machine dep reorg                  :   0.09 (  0%)  6656  (  0%)
 reorder blocks                     :   0.14 (  0%)  3097k (  0%)
 duplicate computed gotos           :   0.01 (  0%)  1565k (  0%)
 shorten branches                   :   0.11 (  0%)   432  (  0%)
 final                              :   0.22 (  1%) 10000k (  1%)
 tree if-combine                    :   0.02 (  0%)   249k (  0%)
 if to switch conversion            :   0.02 (  0%)  1104  (  0%)
 straight-line strength reduction   :   0.02 (  0%)    43k (  0%)
 store merging                      :   0.05 (  0%)  1623k (  0%)
 access analysis                    :   0.10 (  0%)    25k (  0%)
 ext dce                            :   0.03 (  0%)     0  (  0%)
 fold mem offsets                   :   0.04 (  0%)   404k (  0%)
 rest of compilation                :   0.66 (  2%)  4816k (  0%)
 remove unused locals               :   0.27 (  1%)    12k (  0%)
 address taken                      :   0.13 (  0%)   448  (  0%)
 rebuild frequencies                :   0.04 (  0%)   470k (  0%)
 TOTAL                              :  42.70          947M

real    0m42.741s
user    0m42.099s
sys     0m0.496s

   3.03%  cc1plus  cc1plus            [.] bitmap_ior_into(bitmap_head*,
bitmap_head const*)
            |          
            |--2.40%--bitmap_ior_into(bitmap_head*, bitmap_head const*)
            |          |          
            |          |--1.08%--0x1be573b
            |          |          |          
            |          |           --0.62%--0x1be439d
            |          |                     df_worklist_dataflow(dataflow*,
bitmap_head*, int*, int)
            |          |          
            |           --0.51%--0x1c9eb64
            |                     df_worklist_dataflow(dataflow*, bitmap_head*,
int*, int)
            |          
             --0.61%--gori_map::calculate_gori(basic_block_def*)
                       ranger_cache::ranger_cache(int, bool)
                       |          
                        --0.58%--gimple_ranger::gimple_ranger(bool)

  2.99%  cc1plus  cc1plus            [.] 0x00000000015a7203
            |
            ---0x19a71d0
               |          
               |--1.64%--bitmap_ior_into(bitmap_head*, bitmap_head const*)
               |          |          
               |           --1.04%--0x1be573b
               |          
                --0.79%--bitmap_ior_and_compl(bitmap_head*, bitmap_head const*,
bitmap_head const*, bitmap_head const*)
                          |          
                           --0.74%--0x1be4246
                                     df_worklist_dataflow(dataflow*,
bitmap_head*, int*, int)


(the backtrace seems to be corrupted on that one, i truncated it. callers may
not be correct):

     2.72%  cc1plus  cc1plus            [.] bitmap_set_bit(bitmap_head*, int)
            |          
             --2.64%--bitmap_set_bit(bitmap_head*, int)
                       |          
                        --0.56%--0x1bffbda
                                  rewrite_into_loop_closed_ssa(bitmap_head*,
unsigned int)
                                  execute_ranger_vrp(function*, bool)
                                  execute_one_pass(opt_pass*)
                                  execute_pass_list(function*, opt_pass*)
                                  ...

 2.15%  cc1plus  cc1plus            [.] bitmap_and_into(bitmap_head*,
bitmap_head const*)
            |          
             --2.14%--bitmap_and_into(bitmap_head*, bitmap_head const*)
                       |          
                        --1.29%--0x1c9e90f
                                  df_worklist_dataflow(dataflow*, bitmap_head*,
int*, int)

     1.84%  cc1plus  cc1plus            [.] bitmap_bit_p(bitmap_head const*,
int)
            |          
             --1.78%--bitmap_bit_p(bitmap_head const*, int)
                       |          
                        --0.86%--ranger_cache::block_range(vrange&,
basic_block_def*, tree_node*, bool)

     1.50%  cc1plus  cc1plus            [.] get_ref_base_and_extent(tree_node*,
poly_int<1u, long>*, poly_int<1u, long>*, poly_int<1u, long>*, bool*)
            |          
             --1.49%--get_ref_base_and_extent(tree_node*, poly_int<1u, long>*,
poly_int<1u, long>*, poly_int<1u, long>*, bool*)
                       |          
                        --0.86%--0x1b15873
                                  stmt_may_clobber_ref_p_1(gimple*, ao_ref*,
bool)

     1.20%  cc1plus  cc1plus            [.]
pre_and_rev_post_order_compute_fn(function*, int*, int*, bool)
            |          
             --0.95%--pre_and_rev_post_order_compute_fn(function*, int*, int*,
bool)

Reply via email to