------- Comment #4 from dominiq at lps dot ens dot fr 2007-12-03 07:45 ------- The code in comment #3 is indeed inlined, but some cases are not. For instance if you compile the polyhedron test 'channel' with -O3 -ffast-math -funroll-loops and grep for _ddx, you get:
_ddx.837: call _ddx.837 call _ddx.837 If you apply the following patch to channel.f90, i.e., do the inlining yourself, --- channel.f90 2005-10-11 22:53:32.000000000 +0200 +++ chan.v2.f90 2007-11-29 21:30:25.000000000 +0100 @@ -145,10 +145,22 @@ ! ------ interior calculations ------ ! - dudx = ddx(u(:,:,mid)) - dvdy = ddy(v(:,:,mid)) - dhdx = ddx(h(:,:,mid)) - dhdy = ddy(h(:,:,mid)) + dudx(2:M-1,:) = u(3:M,: ,mid)-u(1:M-2,: ,mid) ! interior points + dudx(1,:) = 2*(u(2,: ,mid)-u( 1,: ,mid)) + dudx(M,:) = 2*(u(M,: ,mid)-u(M-1,: ,mid)) + + dvdy(:,2:N-1) = v(:,3:N ,mid)-v(:,1:N-2 ,mid) ! interior points + dvdy(:,1) = 2*(v(:,2 ,mid)-v(:, 1 ,mid)) + dvdy(:,N) = 2*(v(:,N ,mid)-v(:,N-1 ,mid)) + + dhdx(2:M-1,:) = h(3:M,: ,mid)-h(1:M-2,: ,mid) ! interior points + dhdx(1,:) = 2*(h(2,: ,mid)-h( 1,: ,mid)) + dhdx(M,:) = 2*(h(M,: ,mid)-h(M-1,: ,mid)) + + dhdy(:,2:N-1) = h(:,3:N ,mid)-h(:,1:N-2 ,mid) ! interior points + dhdy(:,1) = 2*(h(:,2 ,mid)-h(:, 1 ,mid)) + dhdy(:,N) = 2*(h(:,N ,mid)-h(:,N-1 ,mid)) + u(2:M-1,1:N,new) = u(2:M-1,1:N,old) & ! interior u points +2.d0*dt*f(2:M-1,1:N)*v(2:M-1,1:N,mid) & @@ -234,38 +246,6 @@ 0.5*(v(i,j,mid)+v(i,j-1,mid)) !------------------------------------------------------------ -contains -!------------------------------------------------------------ - function ddx(array) - implicit double precision (a-h,o-z) - double precision:: array(:,:) - double precision:: ddx(size(array,dim=1),size(array,dim=2)) - - I = size(array,dim=1) - J = size(array,dim=2) - - ddx(2:I-1,1:J) = array(3:I,1:J)-array(1:I-2,1:J) ! interior points - - ddx(1,1:J) = 2*(array(2,1:J)-array( 1,1:J)) - ddx(I,1:J) = 2*(array(I,1:J)-array(I-1,1:J)) - - end function ddx - - function ddy(array) - implicit double precision (a-h,o-z) - double precision:: array(:,:) - double precision:: ddy(size(array,dim=1),size(array,dim=2)) - - I = size(array,dim=1) - J = size(array,dim=2) - - ddy(1:I,2:J-1) = array(1:I,3:J)-array(1:I,1:J-2) ! interior points - - ddy(1:I,1) = 2*(array(1:I,2)-array(1:I, 1)) - ddy(1:I,J) = 2*(array(1:I,J)-array(1:I,J-1)) - - end function ddy -!------------------------------------------------------------ end program sw !------------------------------------------------------------ the timing on an Intel Core2Duo 2.16Ghz goes from 4s to 2.2s. So my question is: what are the rules applied by GCC for the inlining? I understand that with -Os, one rule is that inlining must not increase the code size, but what happened in the case of channel.f90 with -O3? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29648