L.S.,
Last week, a colleague of mine from Meteo France held a talk at the
yearly meeting of all researchers working on HARMONIE (see
http://hirlam.org) discussing the performance of our code when compiled
with each of the supported compilers on the Cray XC30 at ECMWF
(http://www.ecmwf.int/en/computing/our-facilities).
In the context of GCC this is relevant, because one of the three
compilers is gfortran (version 4.9.2).
One of his slides discussed the differences in optimizations that the
three compilers offer; I was surprised to learn that GCC/gfortran
doesn't do loop fusion *at all*. Note, I discussed loop fusion (among
other optimizations) at LinuxExpo 99 (http://moene.org/~toon/nwp.ps)
which, unsurprisingly, was held 16 years ago :-)
Why is loop fusion important, especially in Fortran 90 and later programs ?
Because without it, every array assignment is a single loop nest,
isolated from related, same-shape assignments.
Consider this (artificial, but typical) example [updating atmospheric
quantities after the computation of the rate of change during a time
step of the integration]:
SUBROUTINE UPDATE_DT(T, U, V, Q, DTDT, DUDT, DVDT, DQDT, &
& NLON, NLAT, NLEV, TSTEP)
...
REAL, DIMENSION(NLON, NLAT, NLEV) :: T, U, V, Q, DTDT, DUDT, DVDT, DQDT
...
T = T + TSTEP*DTDT ! Update temperature
U = U + TSTEP*DUDT ! Update east-west wind component
V = V + TSTEP*DVDT ! Update north-south wind component
Q = Q + TSTEP*DQDT ! Update specific humidity
...
END
This generates four consecutive 3 deep loop nests over NLEV, NLAT, NLON.
Of course, it would be much more efficient if this were just one loop
nest, as Fortran 77 programmers would write it:
DO JLEV = 1, NLEV
DO JLAT = 1, NLAT
DO JLON = 1, NLON
T(JLON, JLAT, JLEV) = T(JLON, JLAT, JLEV) + TSTEP*DTDT(JLON,
JLAT, JLEV)
U(JLON, JLAT, JLEV) = U(JLON, JLAT, JLEV) + TSTEP*DUDT(JLON,
JLAT, JLEV)
V(JLON, JLAT, JLEV) = V(JLON, JLAT, JLEV) + TSTEP*DVDT(JLON,
JLAT, JLEV)
Q(JLON, JLAT, JLEV) = Q(JLON, JLAT, JLEV) + TSTEP*DQDT(JLON,
JLAT, JLEV)
ENDDO
ENDDO
ENDDO
After a loop fusion optimization pass the Fortran 90 and the Fortran 77
code should result in the same assembler output.
Is this something the Graphite infrastructure could help with ? From the
wiki documentation I get the impression that it only works on single
loop nests, but I must confess that I am not familiar with the
nomenclature in its description ...
Would it be hard to write a loop fusion pass otherwise ?
Kind regards,
--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news