https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91975
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2019-10-04 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- f1 and g1 are detected as memcpy by loop-distribution. f0 is unrolled completely by late unrolling: Loop size: 10 Estimated size after unrolling: 10 while g0 is not: Loop size: 8 Estimated size after unrolling: 20 so the size estimation doesn't quite "work" here. f0 body before unrolling: <bb 3> [local count: 954449108]: # i_14 = PHI <0(2), i_10(4)> # prephitmp_19 = PHI <0(2), pretmp_18(4)> # ivtmp_3 = PHI <8(2), ivtmp_13(4)> _1 = (long unsigned int) i_14; _2 = _1 * 4; _4 = &b + _2; *_4 = prephitmp_19; i_10 = i_14 + 1; ivtmp_13 = ivtmp_3 - 1; if (ivtmp_13 != 0) goto <bb 4>; [87.50%] else goto <bb 5>; [12.50%] <bb 4> [local count: 835156388]: _12 = (long unsigned int) i_10; _11 = _12 * 4; _16 = &a + _11; pretmp_18 = MEM[(const int *)_16]; goto <bb 3>; [100.00%] g0 body: <bb 3> [local count: 954449108]: # s_16 = PHI <&a(2), s_7(4)> # d_17 = PHI <&b(2), d_8(4)> # i_18 = PHI <0(2), i_10(4)> # prephitmp_4 = PHI <0(2), pretmp_5(4)> # ivtmp_3 = PHI <8(2), ivtmp_1(4)> s_7 = s_16 + 4; d_8 = d_17 + 4; *d_17 = prephitmp_4; i_10 = i_18 + 1; ivtmp_1 = ivtmp_3 - 1; if (ivtmp_1 != 0) goto <bb 4>; [87.50%] else goto <bb 5>; [12.50%] <bb 4> [local count: 835156388]: pretmp_5 = MEM[(const int *)s_16 + 4B]; goto <bb 3>; [100.00%] for g0 we do not think that the s_7 = s_16 + 4 are going to be optimized "away" but for f0 we think that _4 = &b + _2 will. Those are actually the same. diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c index 5952cad7bba..d38959c3aa2 100644 --- a/gcc/tree-ssa-loop-ivcanon.c +++ b/gcc/tree-ssa-loop-ivcanon.c @@ -195,9 +195,8 @@ constant_after_peeling (tree op, gimple *stmt, class loop *loop) /* Induction variables are constants when defined in loop. */ if (loop_containing_stmt (stmt) != loop) return false; - tree ev = analyze_scalar_evolution (loop, op); - if (chrec_contains_undetermined (ev) - || chrec_contains_symbols (ev)) + tree ev = instantiate_parameters (loop, analyze_scalar_evolution (loop, op)); + if (chrec_contains_undetermined (ev)) return false; return true; } fixes this but we still end up with size: 8-6, last_iteration: 7-6 Loop size: 8 Estimated size after unrolling: 10 Not unrolling loop 1: size would grow. and not unrolling because the not unrolled estimate is lower than that for f0 (that costs &a + i * 4 as 2 while g0 has IV + 4). I'm testing the above anyway.