gfortran seemingly generates an significatly inferior internal TREE representation than g95 as for Polyhedron's induct.f90 gfortran is 18% slower than g95, which is based on GCC 4.0.3. (Compared with other compilers the difference is even larger.)
(GCC 4.3 is in general faster than GCC 4.1; for induct one does not see any runtime change with the gfortran frontend during the last 1.5 years, though GCC/gfortran 4.1.2 was seemingly slightly faster: http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-induct-19.png ) If one looks at -ftree-vectorizer-verbose, GCC 4.3 is able to vectorize 3 loops with gfortran whereas GCC 4.0 vectorizes 0 loops with g95. For reduced-size example (395 instead of 6635 lines), gfortran is still 13% slower: $ fortran -march=opteron -ffast-math -funroll-loops -ftree-vectorize -ftree-loop-linear -msse3 -O3 test2.f90 $ time a.out real 0m4.632s user 0m4.624s sys 0m0.004s $ g95 -march=opteron -ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 test2.f90 $ time a.out real 0m4.030s user 0m4.024s sys 0m0.004s $ ifort test2.f90 $ time a.out real 0m3.859s user 0m3.856s sys 0m0.000s # NAG f95 + system gcc 4.1.3 $ f95 -O4 -ieee=full -Bstatic -march=opteron -ffast-math -funroll-loops -ftree-vectorize -msse3 test2.f90 $ time a.out real 0m3.381s user 0m3.380s sys 0m0.004s $ sunf95 -w4 -fast -xarch=amd64a -xipo=0 test2.f90 $ time a.out real 0m3.741s user 0m3.736s sys 0m0.000s For induct (on x86_64-unknown-linux-gnu): 51.65 [100%] gfortran -m64 as above 51.90 [100%] gfortran with -fprofile-use 61.41 [118%] gfortran 32bit, x87 46.12 [ 89%] gfortran 32bit, SSE 43.33 [ 83%] ifort 9.1 40.73 [ 78%] ifort 10beta 42.53 [ 82%] sunf95 30.16 [ 58%] pathscale 38.86 [ 75%] NAG f95 using system gcc 4.1 42.65 [ 82%] g95/gcc 4.0.3 (g95 0.91!) -- Summary: gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0- based competitor Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084