http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-11-22 17:13:26 UTC --- Your testcase doesn't ressemble the original, the inner for cycles need clearing of the iteration variable. double C[10][4], B[10][10], A[10][4]; void test (void) { int i = 0, j = 0, l = 0; for (l = 0; l < 10; l++) for (j = 0; j < 10; j += 2) for (i = 0; i < 4; i++) { C[j + 0][i] = C[j + 0][i] + A[l][i] * B[j + 0][l]; C[j + 1][i] = C[j + 1][i] + A[l][i] * B[j + 1][l]; } } is IMHO just a matter whether graphite can -floop-interchange this or not. If you swap manually the l and j for lines, the generated code looks better, though for some reason we unroll even the l loop which increases register pressure too much.