I was helping a friend debug some code and I gave him some louzy suggestions based on what I saw in gfortran. It was regarding the inplace swap algorithm. I suggested to him that he won't benefit from using it on a modern system. This was because I used a "fancy notation" in my inplace algorithm:
I did like this ARR((/J,N-J/)) = ARR((/N-J,J/)) Instead of doing this explicitly: TEMP = ARR(N-J) ARR(N-J) = ARR(J) ARR(J) = TEMP The algorithms should be equally fast, I am right? A few benchmark tests (I will attach test code) na56:/tmp>gfortran -O0 inplace.f90 na56:/tmp>./a.out 2.812175 9.240579 na56:/tmp>gfortran -O3 inplace.f90 na56:/tmp>./a.out 0.5120320 0.9320580 Let's compare with the Intel compiler: Intel(R) Fortran Compiler for applications running on IA-32, Version 10.1 Build 20070913 Package ID: l_fc_p_10.1.008 na56:/tmp>ifort -O0 inplace.f90 na56:/tmp>./a.out 2.160135 2.852178 na56:/tmp>ifort -fast inplace.f90 ipo: remark #11001: performing single-file optimizations ipo: remark #11005: generating object file /tmp/ipo_ifortgSsRZ7.o inplace.f90(10): (col. 17) remark: LOOP WAS VECTORIZED. na56:/tmp>./a.out 0.3720230 0.2120130 Notice that line 10 is the array assignment and not the inplace method. It is not hard to understand why -O0 is slower (the inner loop must be unrolled) Now let's try the "fixed" version with a explicit TEMP variable, na56:/tmp>gfortran -O0 inplacefix.f90 na56:/tmp>./a.out 2.760172 0.8600540 na56:/tmp>gfortran -O3 inplacefix.f90 na56:/tmp>./a.out 0.5080310 0.2280150 (Almost at par with ifort, nice!) I do not know if this is fixed in 4.3. If so, sorry for the duplicate bug. Have a nice Xmas! /Henrik Holst ------------------------------------------------------------------ PROGRAM INPLACE IMPLICIT NONE INTEGER, PARAMETER :: N = 2**15 INTEGER, PARAMETER :: M = 10000 INTEGER :: I, J REAL :: ARR(N) REAL :: T0, T1, T2 CALL CPU_TIME(T0) DO I = 1, M ARR = ARR(N:1:-1) END DO CALL CPU_TIME(T1) DO I = 1, M DO J = 1, N/2 ARR((/J,N-J/)) = ARR((/N-J,J/)) END DO END DO CALL CPU_TIME(T2) PRINT *, T1 - T0 PRINT *, T2 - T1 END PROGRAM INPLACE PROGRAM INPLACEFIX IMPLICIT NONE INTEGER, PARAMETER :: N = 2**15 INTEGER, PARAMETER :: M = 10000 INTEGER :: I, J REAL :: ARR(N), TEMP REAL :: T0, T1, T2 CALL CPU_TIME(T0) DO I = 1, M ARR = ARR(N:1:-1) END DO CALL CPU_TIME(T1) DO I = 1, M DO J = 1, N/2 TEMP = ARR(N-J) ARR(N-J) = ARR(J) ARR(J) = TEMP END DO END DO CALL CPU_TIME(T2) PRINT *, T1 - T0 PRINT *, T2 - T1 END PROGRAM INPLACEFIX -- Summary: Inplace algorithm too slow when using array notation Product: gcc Version: 4.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: holst at matmech dot com GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34568