------- Comment #4 from dominiq at lps dot ens dot fr  2007-12-03 07:45 -------
The code in comment #3 is indeed inlined, but some cases are not. For instance
if you compile the polyhedron test 'channel' with -O3 -ffast-math
-funroll-loops and grep for _ddx, you get:

_ddx.837:
        call    _ddx.837
        call    _ddx.837

If you apply the following patch to channel.f90, i.e., do the inlining
yourself,

--- channel.f90 2005-10-11 22:53:32.000000000 +0200
+++ chan.v2.f90 2007-11-29 21:30:25.000000000 +0100
@@ -145,10 +145,22 @@

     ! ------ interior calculations ------ !

-    dudx = ddx(u(:,:,mid))
-    dvdy = ddy(v(:,:,mid))
-    dhdx = ddx(h(:,:,mid))
-    dhdy = ddy(h(:,:,mid))
+    dudx(2:M-1,:) = u(3:M,: ,mid)-u(1:M-2,: ,mid)    ! interior points
+    dudx(1,:) = 2*(u(2,: ,mid)-u(  1,: ,mid))
+    dudx(M,:) = 2*(u(M,: ,mid)-u(M-1,: ,mid))
+    
+    dvdy(:,2:N-1) = v(:,3:N ,mid)-v(:,1:N-2 ,mid)    ! interior points
+    dvdy(:,1) = 2*(v(:,2 ,mid)-v(:,  1 ,mid))
+    dvdy(:,N) = 2*(v(:,N ,mid)-v(:,N-1 ,mid))
+
+    dhdx(2:M-1,:) = h(3:M,: ,mid)-h(1:M-2,: ,mid)    ! interior points
+    dhdx(1,:) = 2*(h(2,: ,mid)-h(  1,: ,mid))
+    dhdx(M,:) = 2*(h(M,: ,mid)-h(M-1,: ,mid))
+
+    dhdy(:,2:N-1) = h(:,3:N ,mid)-h(:,1:N-2 ,mid)    ! interior points
+    dhdy(:,1) = 2*(h(:,2 ,mid)-h(:,  1 ,mid))
+    dhdy(:,N) = 2*(h(:,N ,mid)-h(:,N-1 ,mid))
+

     u(2:M-1,1:N,new) = u(2:M-1,1:N,old) &               ! interior u points
         +2.d0*dt*f(2:M-1,1:N)*v(2:M-1,1:N,mid) &
@@ -234,38 +246,6 @@
                 0.5*(v(i,j,mid)+v(i,j-1,mid))

 !------------------------------------------------------------
-contains
-!------------------------------------------------------------
-    function ddx(array)
-    implicit double precision (a-h,o-z)
-    double precision::          array(:,:)
-    double precision::          ddx(size(array,dim=1),size(array,dim=2))
-
-    I = size(array,dim=1)
-    J = size(array,dim=2)
-
-    ddx(2:I-1,1:J) = array(3:I,1:J)-array(1:I-2,1:J)    ! interior points
-
-    ddx(1,1:J) = 2*(array(2,1:J)-array(  1,1:J))
-    ddx(I,1:J) = 2*(array(I,1:J)-array(I-1,1:J))
-
-    end function ddx
-
-    function ddy(array)
-    implicit double precision (a-h,o-z)
-    double precision::          array(:,:)
-    double precision::          ddy(size(array,dim=1),size(array,dim=2))
-
-    I = size(array,dim=1)
-    J = size(array,dim=2)
-
-    ddy(1:I,2:J-1) = array(1:I,3:J)-array(1:I,1:J-2)    ! interior points
-
-    ddy(1:I,1) = 2*(array(1:I,2)-array(1:I,  1))
-    ddy(1:I,J) = 2*(array(1:I,J)-array(1:I,J-1))
-
-    end function ddy
-!------------------------------------------------------------
 end program sw

 !------------------------------------------------------------

the timing on an Intel Core2Duo 2.16Ghz goes from 4s to 2.2s.

So my question is: what are the rules applied by GCC for the inlining? I
understand that with -Os, one rule is that inlining must not increase the code
size, but what happened in the case of channel.f90 with -O3?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29648

Reply via email to