https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253
Bug ID: 102253 Summary: scalability issues with large loop depth Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When investigating an improvement to LIMs fill_always_executed_in I created the following testcase which creates a loop nest of depth N with conditionally executed subloops. extern void foobar (int); template <int a> struct bar { static void baz(int b, int) { if (b & (1 << (a % 32))) for (int i = 0; i < 1024; ++i) bar<a-1>::baz (b, i); } }; template <> struct bar<0> { static void baz (int, int i) { foobar (i); } }; void __attribute__((flatten)) foo(int b) { #ifndef N #define N 10 #endif bar<N>::baz (b, 0); } For N == 900 (the maximum unless you also specify -ftemplate-depth) and -O1 we see tree canonical iv : 1.42 ( 13%) 0.00 ( 0%) 1.42 ( 13%) 28M ( 13%) complete unrolling : 2.80 ( 27%) 0.00 ( 0%) 2.81 ( 26%) 42M ( 19%) integrated RA : 3.41 ( 32%) 0.32 ( 80%) 3.72 ( 34%) 640k ( 0%) TOTAL : 10.54 0.40 10.96 224M For N == 1800 and -O1 it is already tree canonical iv : 30.43 ( 28%) 0.05 ( 14%) 30.50 ( 28%) 116M ( 15%) complete unrolling : 63.96 ( 59%) 0.06 ( 17%) 64.04 ( 59%) 175M ( 22%) tree iv optimization : 5.75 ( 5%) 0.00 ( 0%) 5.77 ( 5%) 126M ( 16%) integrated RA : 1.40 ( 1%) 0.12 ( 34%) 1.53 ( 1%) 1754k ( 0%) TOTAL : 108.35 0.35 108.75 796M For reference compile-time with N == 450 is 2.5s with tree canonical iv : 0.18 ( 7%) 0.00 ( 0%) 0.19 ( 7%) 6904k ( 10%) complete unrolling : 0.34 ( 14%) 0.00 ( 0%) 0.34 ( 13%) 8412k ( 13%)