I was helping a friend debug some code and I gave him some louzy suggestions
based on what I saw in gfortran. It was regarding the inplace swap algorithm. I
suggested to him that he won't benefit from using it on a modern system. This
was because I used a "fancy notation" in my inplace algorithm:

I did like this

ARR((/J,N-J/)) = ARR((/N-J,J/))

Instead of doing this explicitly:

TEMP = ARR(N-J)
ARR(N-J) = ARR(J)
ARR(J) = TEMP

The algorithms should be equally fast, I am right?

A few benchmark tests (I will attach test code)

na56:/tmp>gfortran -O0 inplace.f90
na56:/tmp>./a.out
   2.812175    
   9.240579    

na56:/tmp>gfortran -O3 inplace.f90
na56:/tmp>./a.out
  0.5120320    
  0.9320580    

Let's compare with the Intel compiler:
Intel(R) Fortran Compiler for applications running on IA-32, Version 10.1   
Build 20070913 Package ID: l_fc_p_10.1.008

na56:/tmp>ifort -O0 inplace.f90 
na56:/tmp>./a.out 
   2.160135    
   2.852178 

na56:/tmp>ifort -fast inplace.f90
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_ifortgSsRZ7.o
inplace.f90(10): (col. 17) remark: LOOP WAS VECTORIZED.
na56:/tmp>./a.out 
  0.3720230    
  0.2120130    

Notice that line 10 is the array assignment and not the inplace method. It is
not hard to understand why -O0 is slower (the inner loop must be unrolled)

Now let's try the "fixed" version with a explicit TEMP variable,

na56:/tmp>gfortran -O0 inplacefix.f90
na56:/tmp>./a.out
   2.760172    
  0.8600540    

na56:/tmp>gfortran -O3 inplacefix.f90
na56:/tmp>./a.out
  0.5080310    
  0.2280150    

(Almost at par with ifort, nice!)

I do not know if this is fixed in 4.3. If so, sorry for the duplicate bug.

Have a nice Xmas!
/Henrik Holst

------------------------------------------------------------------

PROGRAM INPLACE
        IMPLICIT NONE
        INTEGER, PARAMETER :: N = 2**15
        INTEGER, PARAMETER :: M = 10000
        INTEGER            :: I, J
        REAL               :: ARR(N)
        REAL               :: T0, T1, T2
        CALL CPU_TIME(T0)
        DO I = 1, M
                ARR = ARR(N:1:-1)
        END DO
        CALL CPU_TIME(T1)
        DO I = 1, M
                DO J = 1, N/2
                        ARR((/J,N-J/)) = ARR((/N-J,J/))
                END DO
        END DO
        CALL CPU_TIME(T2)
        PRINT *, T1 - T0
        PRINT *, T2 - T1
END PROGRAM INPLACE

PROGRAM INPLACEFIX
        IMPLICIT NONE
        INTEGER, PARAMETER :: N = 2**15
        INTEGER, PARAMETER :: M = 10000
        INTEGER            :: I, J
        REAL               :: ARR(N), TEMP
        REAL               :: T0, T1, T2
        CALL CPU_TIME(T0)
        DO I = 1, M
                ARR = ARR(N:1:-1)
        END DO
        CALL CPU_TIME(T1)
        DO I = 1, M
                DO J = 1, N/2
                        TEMP = ARR(N-J)
                        ARR(N-J) = ARR(J)
                        ARR(J) = TEMP
                END DO
        END DO
        CALL CPU_TIME(T2)
        PRINT *, T1 - T0
        PRINT *, T2 - T1
END PROGRAM INPLACEFIX


-- 
           Summary: Inplace algorithm too slow when using array notation
           Product: gcc
           Version: 4.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: holst at matmech dot com
GCC target triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34568

Reply via email to