------- Comment #9 from dominiq at lps dot ens dot fr 2009-11-20 13:45 ------- I am rather confused by some comments:
(1) Although I am not fluent with x86 assembly, I am pretty sure that no code in eval is vectorized (assembly taken from this pr or from the original post http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html). (2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and i+2*n. (3) On a core2duo 2.1Ghz, I only see small changes in the timing between 4.3.4 to trunk, -O1 to -O3, and 32 or 64 bit mode. Now if I do the following change: --- pr42108_1_db.f90 2009-11-20 14:14:05.000000000 +0100 +++ pr42108_1_db_1.f90 2009-11-20 14:15:24.000000000 +0100 @@ -7,12 +7,10 @@ subroutine eval(foo1,foo2,foo3,foo4,x,n do i=2,n foo3(i)=foo2*foo4(i) do j=1,i-1 - temp=0.0d0 - jmini=j-i - do k=i,nnd,n - temp=temp+(x(k)-x(k+jmini))**2 - end do - temp = sqrt(temp+foo1) + temp = sqrt( (x(i) - x(j))**2 & + +(x(i+n) - x(j+n))**2 & + +(x(i+2*n)-x(j+2*n))**2 & + +foo1) foo3(i)=foo3(i)+temp*foo4(j) foo3(j)=foo3(j)+temp*foo4(i) end do I go from 9.2s to 5.5s for n=20000. So the k loop is not automatically unrolled even with -funroll-loops. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108