https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- We no longer unroll the inner loops in cunrolli because cunrolli will leave us with exit checks. We fail to compute the number of iterations of the inner loop(s) (pre loop header copying): <bb 5> [local count: 21065692]: L.5: _3 = _1 + 1; _53 = (integer(kind=8)) _3; _4 = _1 + 2; _54 = (integer(kind=8)) _4; _55 = (integer(kind=8)) i1_25; _5 = _55 * 81; _56 = _5 + -91; $3 = <basic_block 0x7ffff689a478 (5)> (gdb) p debug_bb_n (7) <bb 7> [local count: 63197075]: _6 = S.0_27 * 9; _57 = _6 + _56; <bb 8> [local count: 189610187]: # S.1_28 = PHI <_53(7), S.1_59(9)> if (S.1_28 > _54) goto <bb 10>; [33.33%] else goto <bb 9>; [66.67%] $1 = <basic_block 0x7ffff689a5b0 (8)> (gdb) p debug_bb_n (9) <bb 9> [local count: 126413112]: _7 = S.1_28 + _57; _8 = test_array[_7]; _9 = _8 + -10; test_array[_7] = _9; S.1_59 = S.1_28 + 1; goto <bb 8>; [100.00%] this one being a bit difficult, but the other (but not as interesting(?)): <bb 17> [local count: 119292717]: L.14: _14 = _1 + 1; _69 = (integer(kind=8)) _14; _15 = _1 + 2; _70 = (integer(kind=8)) _15; _71 = (integer(kind=8)) i2_26; _16 = _71 * 81; _72 = _16 + -91; # S.4_31 = PHI <_69(19), S.4_75(21)> if (S.4_31 > _70) goto <bb 22>; [33.33%] else goto <bb 21>; [66.67%] <bb 21> [local count: 715863674]: _18 = S.4_31 + _73; _19 = test_array[_18]; _20 = _19 + 10; test_array[_18] = _20; S.4_75 = S.4_31 + 1; goto <bb 20>; [100.00%] looks like it should be doable. And indeed it is - we are just "confused" by the maybe_zero test. IMHO we should allow constant zero or N iterations by performing the loop header copying alongside the unrolling (leaving the first exit test unremoved). Testing a patch to do that.