SUBROUTINE FOO (K) INTEGER I, J, K, A(5,5), B COMMON A A(1,1) = 1 10 B = 0 DO 30 I = 1, K DO 20 J = 1, K B = B + A(I,J) 20 CONTINUE A(I,I) = A(I,I) * 2 30 CONTINUE IF (B.GE.3) RETURN GO TO 10 END SUBROUTINE
PROGRAM BAR CALL FOO (2) END is miscompiled on redhat/gcc-4_1-branch, with -O2 -ftree-loop-linear, I believe on all arches, tested at least ppc -m32 and x86_64 -m64. Unfortunately, I couldn't reproduce this on gcc-4_1-branch nor on the trunk, because the statements coming from earlier passes are slightly different, though IMHO valid in all 3 cases while what -ftree-loop-linear creates is clearly invalid. I believe it is a latent bug in lambda-code.c (see below), although I don't have any testcases that would prove it on non-vendor branches. All vars used in foo are int4, the body in *.empty is on redhat/gcc-4_1-branch: <bb 0>: __BLNK__.a[0] = 1; D.684_14 = *k_13; goto <bb 2> (<L24>); __label_000010:; <L24>:; if (D.684_14 > 0) goto <L28>; else goto __label_000010; <L28>:; # b_5 = PHI <0(3), b_37(8)>; # i_1 = PHI <1(3), i_28(8)>; <L25>:; pretmp.29_7 = i_1 + -6; # j_9 = PHI <1(4), j_35(6)>; # b_6 = PHI <b_5(4), b_33(6)>; <L3>:; D.697_29 = j_9 * 5; D.698_30 = pretmp.29_7; D.699_31 = pretmp.29_7 + D.697_29; D.700_32 = __BLNK__.a[D.699_31]; b_33 = D.700_32 + b_6; j_35 = j_9 + 1; if (j_9 == D.684_14) goto <L8>; else goto <L30>; <L30>:; goto <bb 5> (<L3>); # b_37 = PHI <b_33(5)>; <L8>:; D.701_18 = i_1 * 5; D.702_19 = pretmp.29_7; D.703_20 = pretmp.29_7 + D.701_18; D.704_24 = __BLNK__.a[D.703_20]; D.705_25 = D.704_24 * 2; __BLNK__.a[D.703_20] = D.705_25; if (i_1 == D.684_14) goto <L13>; else goto <L31>; <L31>:; i_28 = i_1 + 1; goto <bb 4> (<L25>); # b_8 = PHI <b_37(7)>; <L13>:; if (b_8 > 2) goto __return_foo; else goto __label_000010; __return_foo:; return; On gcc-4_1-branch the difference is just that pretmp.29_7 doesn't exist, all users of that have i_1 + -6 instead. On the trunk, pretmp var is only used in the innermost loop (L3..L30) and after L8 label it instead uses i_1 * 6 + -6. Now, -ftree-loop-linear on the above determines the L3..L30 loop isn't perfectly nested in L25..L31 but can_convert_to_perfect_nest says it can be converted to perfect nest, because the pretmp.9_7 = i_1 + -6; satisfies: /* If the IV is simple, it can be duplicated. */ if (!automatically_generated_chrec_p (scev)) { tree step = evolution_part_in_loop_num (scev, loop->num); if (step && step != chrec_dont_know && TREE_CODE (step) == INTEGER_CST) continue; } -ftree-loop-linear then changes the loops, so that the b computation is done using 2 nested loops and the *2 multiplication in a separate loop. Unfortunately, it doesn't compute the start of the loop correctly, above, the first __BLKN__.a[] that is multiplied by 2 is pretmp.29_7 + i_1 * 5 = i_1 * 6 - 6 where i_1 starts at 1, but after the transformation it is perfectiv.32_12 * 5 + perfectiv.32_12: ... # perfectiv.32_12 = PHI <1(11), perfectiv.32_39(14)>; <L34>:; <bb 13>: uboundvar.33_40 = D.684_14 + 1; perfectiv.32_39 = perfectiv.32_12 + 1; D.701_18 = perfectiv.32_12 * 5; D.702_19 = perfectiv.32_12; D.703_20 = perfectiv.32_12 + D.701_18; D.704_24 = __BLNK__.a[D.703_20]; D.705_25 = D.704_24 * 2; __BLNK__.a[D.703_20] = D.705_25; if (uboundvar.33_40 >= perfectiv.32_39) goto <L35>; else goto <L13>; <L35>:; goto <bb 12> (<L34>); can_convert_to_perfect_nest only looks at evolution part of the scev: if (TREE_CODE (stmt) == MODIFY_EXPR) { use_operand_p use_a, use_b; imm_use_iterator imm_iter; ssa_op_iter op_iter, op_iter1; tree op0 = TREE_OPERAND (stmt, 0); tree scev = instantiate_parameters (loop, analyze_scalar_evolution (loop, op0)); /* If the IV is simple, it can be duplicated. */ if (!automatically_generated_chrec_p (scev)) { tree step = evolution_part_in_loop_num (scev, loop->num); if (step && step != chrec_dont_know && TREE_CODE (step) == INTEGER_CST) continue; } and replace_uses_equiv_to_x_with_y happily replaces vars with the perfectiv if the evolution part of the scev is constant and equal to the loop step: static void replace_uses_equiv_to_x_with_y (struct loop *loop, tree stmt, tree x, int xstep, tree y) { ssa_op_iter iter; use_operand_p use_p; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) { tree use = USE_FROM_PTR (use_p); tree step = NULL_TREE; tree scev = instantiate_parameters (loop, analyze_scalar_evolution (loop, use)); if (scev != NULL_TREE && scev != chrec_dont_know) step = evolution_part_in_loop_num (scev, loop->num); if ((step && step != chrec_dont_know && TREE_CODE (step) == INTEGER_CST && int_cst_value (step) == xstep) || USE_FROM_PTR (use_p) == x) SET_USE (use_p, y); } } For USE_FROM_PTR (use_p) == x that's fine and if i1 = initial_condition_in_loop_num (scev, loop->num) is equal to i2 = initial_condition_in_loop_num (instantiate_parameters (loop, analyze_scalar_evolution (loop, x))) value of y that's fine too. But if it is not, I believe we need to replace use with y + i1 - i2. In the testcase I posted, USE's initial_condition_in_loop_num is -5 and X's initial_condition_in_loop_num is 1. And, adding -5 - 1 to perfectiv (aka Y in the above routine) is exactly what would cure this testcase. Do you agree with this? Not sure how hard is that though, other alternative would be to make can_convert_to_perfect_nest more conservative. -- Summary: Latent bug in 4.1/4.2/4.3 lambda-code.c Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jakub at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29581