[Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107

dominiq at lps dot ens.fr Sun, 11 Dec 2011 06:08:36 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51497


--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-11 
14:07:59 UTC ---
Upon further looking at the assembly, I have found that the seven loops in
spmmult are all vectorized without -flto, while none of them are with -flto. 

For nf2dprecon after trisolve inlining, the code looks like

subroutine NF2DPrecon(x,gi,au1,au2,i1,i2,nx)       ! 2D NF Preconditioning
matrix
implicit none
integer :: i1,i2,nx
real(8),dimension(i2)::x,t,gi,au1,au2
integer :: i,j
do i = i1 , i2 , nx
   if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
   x(i) = gi(i)* x(i)
   do j = i+1 , i+nx-1
      x(j) = gi(j)*(x(j)-au1(j-1)*x(j-1))
   enddo
   do j = i+nx-2 , i , -1
      x(j) = x(j) - gi(j)*au1(j)*x(j+1)
   enddo
enddo 
do i = i2-2*nx+1 , i1 , -nx
   t(i:i+nx-1) = au2(i:i+nx-1)*x(i+nx:i+2*nx-1)
   t(i) = gi(i)* t(i)
   do j = i+1 , i+nx-1
      t(j) = gi(j)*(t(j)-au1(j-1)*t(j-1))
   enddo
   do j = i+nx-2 , i , -1
      t(j) = t(j) - gi(j)*au1(j)*t(j+1)
   enddo
   x(i:i+nx-1) = x(i:i+nx-1) - t(i:i+nx-1)
enddo
end subroutine NF2DPrecon            !=========================================

where none of the explicit 'do j' loops are vectorized ("possible dependence
between data-refs") while the three implicit loops are vectorized without
-flto, while only the last two are with -flto. Note that the first loop not
vectorized with -lflto:

x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)

is vectorized without it with "created 1 versioning for alias checks." (alias
between au2 and x? if yes, valid Fortran codes guarantee that there is no
aliasing).

[Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107

Reply via email to