https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67783
Bug ID: 67783 Summary: quadratic time consumption in IPA inlining with -O1 and higher Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: skvadrik at gmail dot com Target Milestone: --- Created attachment 36423 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36423&action=edit 1.c, real1.c Consider the following code (1.c): static void f (const char * const * p) {} void g (const char * p) { loop: if (*p++) goto loop; if (*p++) goto loop; // ... if (*p++) goto loop; f (&p); } Compilation time with -O1 grows quadratically in the number of repetitions of statement 'if (*p++) goto loop;'. For example, 80 repetitions result in 11s: $ time gcc -c 1.c -O1 real 0m11.134s user 0m11.097s sys 0m0.013s 96 repetitions result in 30s, 120 repetitions result in 1m20s and so on. See file '1.c' in attach. Also see file 'real1.c' in attach for an example of real-world program (autogenerated by re2c) from which '1.c' was cut down (it exhibits quadratic behaviour with -O2 and higher). I tried to analyze a bit. Here's what I found: Time report: $ gcc -c 1.c -O1 -ftime-report Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1066 kB (29%) ggc phase opt and generate : 11.44 (100%) usr 0.00 ( 0%) sys 11.47 (100%) wall 2385 kB (66%) ggc ipa inlining heuristics : 11.34 (99%) usr 0.00 ( 0%) sys 11.36 (99%) wall 12 kB ( 0%) ggc df reaching defs : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc df live&initialized regs: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc register information : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc alias stmt walking : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree split crit edges : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 37 kB ( 1%) ggc scev constant prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 33 kB ( 1%) ggc tree SSA verifier : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc loop init : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 1595 kB (44%) ggc branch prediction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 6 kB ( 0%) ggc initialize rtl : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 12 kB ( 0%) ggc TOTAL : 11.44 0.00 11.48 3617 kB Extra diagnostic checks enabled; compiler may run slowly. Configure with --enable-checking=release to disable checks. So, it must be IPA inlining. First,`valgrind --tool=callgrind` showed that 99% of time was spent in 'estimate_function_body_sizes' from 'gcc/gcc/ipa-inline-analysis.c'.I further tracked it down to loop on line 2759: https://github.com/gcc-mirror/gcc/blob/master/gcc/ipa-inline-analysis.c#L2759 and inner loop on line 2791: https://github.com/gcc-mirror/gcc/blob/master/gcc/ipa-inline-analysis.c#L2791 This nested loop causes quadratic time consumption. Second, if I comment out 'f (&p);' in the end of '1.c', file compiles instantly, but the number of iterations in nested loop remains quadratic. So the problem is not in quadratic number of iteration, it's in the loop's body. Note that the function 'f' is empty and its argument is immutable, so it cannot change 'p' and influence control flow. This is what `perf record`/`perf report` shows: 14.38% cc1 cc1 [.] find_var_scev_info 13.58% cc1 cc1 [.] analyze_scalar_evolution_1 11.25% cc1 cc1 [.] loop_preheader_edge 8.56% cc1 cc1 [.] instantiate_scev_r 7.83% cc1 cc1 [.] interpret_rhs_expr 6.82% cc1 cc1 [.] htab_find_slot_with_hash 6.65% cc1 cc1 [.] analyze_scalar_evolution 5.63% cc1 cc1 [.] useless_type_conversion_p 5.62% cc1 cc1 [.] chrec_convert_1 3.06% cc1 cc1 [.] compute_scalar_evolution_in_loop 2.98% cc1 cc1 [.] chrec_contains_symbols_defined_in_loop 1.90% cc1 cc1 [.] instantiate_scev 1.80% cc1 cc1 [.] eq_idx_scev_info 1.57% cc1 cc1 [.] flow_loop_nested_p 1.36% cc1 cc1 [.] dominated_by_p 1.16% cc1 cc1 [.] is_gimple_min_invariant 1.02% cc1 cc1 [.] compute_overall_effect_of_inner_loop 0.88% cc1 cc1 [.] flow_bb_inside_loop_p 0.83% cc1 cc1 [.] extract_ops_from_tree_1 0.51% cc1 cc1 [.] superloop_at_depth `valgrind --tool=callgrind` suggests that 'analyze_scalar_evolution' was called ~50000000 times.