------- Comment #23 from spop at gcc dot gnu dot org 2010-03-09 22:31 ------- Thanks for reducing this testcase.
On the further reduced kernel: SUBROUTINE SPECTOP(Dr,N) IMPLICIT REAL*8(A-H,o-Z) DIMENSION d1(0:32,0:32) , Dr(0:32,0:32) , x(0:32) REAL*8 Dr DO k = 1 , N-1 d1(k,0) = fctr1/(t*(x(k)-x(0))) fctr = -fctr1 DO j = 1 , N-1 if (j.ne.k) d1(k,j) = fctr/(x(k)-x(j)) fctr = -fctr ENDDO d1(k,N) = fctr/(t*(x(k)-x(N))) fctr1 = -fctr1 ENDDO Dr(k,j) = d1(N-k,N-j) CONTINUE END We have this CLAST generated by CLooG: if (T_10 >= 3) { S4(0) ; S5(0,0) ; S7(0,0) ; for (scat_3=1;scat_3<=T_10-2;scat_3++) { S5(0,scat_3) ; S6(0,scat_3) ; S7(0,scat_3) ; } S9(0) ; S14(0) ; } for (scat_1=1;scat_1<=T_10-3;scat_1++) { S4(scat_1) ; S5(scat_1,scat_1) ; S7(scat_1,scat_1) ; for (scat_3=scat_1+1;scat_3<=T_10-2;scat_3++) { S5(scat_1,scat_3) ; S6(scat_1,scat_3) ; S7(scat_1,scat_3) ; } for (scat_3=0;scat_3<=scat_1-1;scat_3++) { S5(scat_1,scat_3) ; S6(scat_1,scat_3) ; S7(scat_1,scat_3) ; } S9(scat_1) ; S14(scat_1) ; } if (T_10 >= 3) { S4(T_10-2) ; S5(T_10-2,T_10-2) ; S7(T_10-2,T_10-2) ; for (scat_3=0;scat_3<=T_10-3;scat_3++) { S5(T_10-2,scat_3) ; S6(T_10-2,scat_3) ; S7(T_10-2,scat_3) ; } S9(T_10-2) ; S14(T_10-2) ; } if (T_10 == 2) { S4(0) ; S5(0,0) ; S7(0,0) ; S9(0) ; S14(0) ; } Where T_10 stands for the parameter N, S4 stands for d1(k,0) = fctr1/(t*(x(k)-x(0))) fctr = -fctr1 S5 stands for "if (j.ne.k)" S6 stands for "d1(k,j) = fctr/(x(k)-x(j))" S7 stands for "fctr = -fctr". S14 stands for d1(k,N) = fctr/(t*(x(k)-x(N))) fctr1 = -fctr1 So the error seems to be that we are splitting the index scat_3 into two ranges that are not ordered in the same way as in the original loop nest: for (scat_1=1;scat_1<=T_10-3;scat_1++) { S4(scat_1) ; S5(scat_1,scat_1) ; S7(scat_1,scat_1) ; for (scat_3=scat_1+1;scat_3<=T_10-2;scat_3++) { S5(scat_1,scat_3) ; S6(scat_1,scat_3) ; S7(scat_1,scat_3) ; } for (scat_3=0;scat_3<=scat_1-1;scat_3++) { S5(scat_1,scat_3) ; S6(scat_1,scat_3) ; S7(scat_1,scat_3) ; } S9(scat_1) ; S14(scat_1) ; } We are executing the range [scat_1+1, T_10-2] before executing the range [0, scat_1-1]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42181