https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109184
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- Testcase with just the essential stuff. static int g_1731[7] = { 42, 0, 0, 0, 0, 0, 42 }; void __attribute__((noipa)) foo () { int l_1930[5] = { 0, }; for (int i = 0; i < 15; ++i) for (int j = 4; (j >= 1); j -= 1) #pragma GCC unroll 0 for (int k = 0; (k <= 4); k += 1) g_1731[(j + 1)] = --l_1930[k]; } int main() { foo (); if (g_1731[0] != 42 || g_1731[1] != 0 || g_1731[2] != -60 || g_1731[3] != -59 || g_1731[4] != -58 || g_1731[5] != -57 || g_1731[6] != 42) __builtin_abort (); return 0; } The innermost loop body then is <bb 3> [local count: 894749066]: # k_26 = PHI <k_17(11), 0(5)> # ivtmp_23 = PHI <ivtmp_21(11), 5(5)> _1 = l_1930[k_26]; _2 = _1 + -1; l_1930[k_26] = _2; g_1731[_6] = _2; k_17 = k_26 + 1; ivtmp_21 = ivtmp_23 - 1; if (ivtmp_21 != 0) one should note that for data dependence analysis we'd usually need to treat scalars (in this case SSA names) as arrays of the size of the whole nest iteration domain and the dependences would be between statements, not reads/writes. So the above is _1 = l_1930[k_26]; _2[i] = _1 + -1; l_1930[k_26] = _2[i]; g_1731[_6] = _2[i]; then and when we interchange the loop we suddenly need two different _2[] elements and when eliminating _2[] there's a dependence between the l_1930 store and the implied load from a different iteration. Note that when l_1930[k] wouldn't be stored to g_1731[j+1] the interchange would be of course valid and we do not want to break that case.