http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51497
--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-11 14:07:59 UTC --- Upon further looking at the assembly, I have found that the seven loops in spmmult are all vectorized without -flto, while none of them are with -flto. For nf2dprecon after trisolve inlining, the code looks like subroutine NF2DPrecon(x,gi,au1,au2,i1,i2,nx) ! 2D NF Preconditioning matrix implicit none integer :: i1,i2,nx real(8),dimension(i2)::x,t,gi,au1,au2 integer :: i,j do i = i1 , i2 , nx if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1) x(i) = gi(i)* x(i) do j = i+1 , i+nx-1 x(j) = gi(j)*(x(j)-au1(j-1)*x(j-1)) enddo do j = i+nx-2 , i , -1 x(j) = x(j) - gi(j)*au1(j)*x(j+1) enddo enddo do i = i2-2*nx+1 , i1 , -nx t(i:i+nx-1) = au2(i:i+nx-1)*x(i+nx:i+2*nx-1) t(i) = gi(i)* t(i) do j = i+1 , i+nx-1 t(j) = gi(j)*(t(j)-au1(j-1)*t(j-1)) enddo do j = i+nx-2 , i , -1 t(j) = t(j) - gi(j)*au1(j)*t(j+1) enddo x(i:i+nx-1) = x(i:i+nx-1) - t(i:i+nx-1) enddo end subroutine NF2DPrecon !========================================= where none of the explicit 'do j' loops are vectorized ("possible dependence between data-refs") while the three implicit loops are vectorized without -flto, while only the last two are with -flto. Note that the first loop not vectorized with -lflto: x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1) is vectorized without it with "created 1 versioning for alias checks." (alias between au2 and x? if yes, valid Fortran codes guarantee that there is no aliasing).