https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112281
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- ldist distributes the loop nest where which looks like <bb 3> [local count: 118111600]: # c.6_23 = PHI <_4(9), 2(2)> # ivtmp_1 = PHI <ivtmp_27(9), 2(2)> _13 = c.6_23 + 1; <bb 4> [local count: 955630224]: # e.4_22 = PHI <_2(10), 0(3)> # ivtmp_28 = PHI <ivtmp_3(10), 2(3)> b = d[_13]; d[c.6_23] = b; d[_13].a = 0; _2 = e.4_22 + 1; ivtmp_3 = ivtmp_28 - 1; if (ivtmp_3 != 0) goto <bb 10>; [89.00%] else goto <bb 5>; [11.00%] <bb 10> [local count: 850510900]: goto <bb 4>; [100.00%] <bb 5> [local count: 118111600]: _4 = c.6_23 + -1; ivtmp_27 = ivtmp_1 - 1; if (ivtmp_27 != 0) goto <bb 9>; [89.00%] else goto <bb 6>; [11.00%] <bb 9> [local count: 105119324]: goto <bb 3>; [100.00%] so there is no evolution of the DR indices in the inner loop. _4 is also dead and is eliminated by the transform. -fdisable-tree-ivcanon avoids the dead code but doesn't change the fact we miscompile this. One interesting fact is that we have Creating dr for d[c.6_23] ... base_object: d Access function 0: {2, +, -1}_1 but Creating dr for d[_13].a ... base_object: d Access function 0: 32 Access function 1: {3, +, -1}_1 but we do seem to get along here, computing (compute_affine_dependence ref_a: d[c.6_23], stmt_a: d[c.6_23] = b; ref_b: d[_13].a, stmt_b: d[_13].a = 0; (analyze_overlapping_iterations (chrec_a = {2, +, -1}_1) (chrec_b = {3, +, -1}_1) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [0 + 1 * x_1]) (overlaps_b = [1 + 1 * x_1])) ) (overlap_iterations_a = [0 + 1 * x_1]) (overlap_iterations_b = [1 + 1 * x_1])) (analyze_overlapping_iterations (chrec_a = {3, +, -1}_1) (chrec_b = {2, +, -1}_1) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [1 + 1 * x_1]) (overlaps_b = [0 + 1 * x_1])) ) (overlap_iterations_a = [1 + 1 * x_1]) (overlap_iterations_b = [0 + 1 * x_1])) (build_classic_dist_vector dist_vector = (1 0 ) ) (reversed) OTOH for we compute the same distance for (compute_affine_dependence ref_a: d[_13], stmt_a: b = d[_13]; ref_b: d[c.6_23], stmt_b: d[c.6_23] = b; (analyze_overlapping_iterations (chrec_a = {3, +, -1}_1) (chrec_b = {2, +, -1}_1) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [1 + 1 * x_1]) (overlaps_b = [0 + 1 * x_1])) ) (overlap_iterations_a = [1 + 1 * x_1]) (overlap_iterations_b = [0 + 1 * x_1])) (analyze_overlapping_iterations (chrec_a = {2, +, -1}_1) (chrec_b = {3, +, -1}_1) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [0 + 1 * x_1]) (overlaps_b = [1 + 1 * x_1])) ) (overlap_iterations_a = [0 + 1 * x_1]) (overlap_iterations_b = [1 + 1 * x_1])) (build_classic_dist_vector dist_vector = (1 0 ) ) I think that /* If the overlap is exact preserve stmt order. */ else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), DDR_NB_LOOPS (ddr))) ; is not good enough. If the dependence distance is zero between any two iterations we have to preserve execution order. That means the innermost loop may not have zero dependence distance. This was adjusted also for PR87022.