Some time ago I had a look at pr30388 and got the following results: g77 -O2 g95 -O2 gfc -O2 gfc -m64 -O2 MFLOPS: 1063 1061 858 1129
ref. g77 -19% +6% Since the evening is quite calm I decided to check if this speedup with -m64 is generic or not and I got the following timings for the Polyhedron test suite: ================================================================================ Date & Time : 27 Dec 2007 22:24:03 Test Name : pbharness Compile Command : gfc %n.f90 -m64 -O3 -ffast-math -funroll-loops -finline-limit=600 --param min-vect-loop-bound=2 -o %n Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft Maximum Times : 300.0 Target Error % : 0.200 Minimum Repeats : 2 Maximum Repeats : 5 Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 4.27 50712 13.10 2 0.0420 aermod 100.72 1200712 30.19 2 0.0066 air 6.68 73204 9.37 2 0.0267 capacita 3.92 64520 56.49 2 0.0628 channel 2.43 42752 2.29 2 0.0437 doduc 14.42 179504 48.66 2 0.0021 fatigue 5.69 76696 11.17 5 0.3700 gas_dyn 6.32 700392 10.24 5 0.7605 induct 12.79 160672 66.27 2 0.0053 linpk 1.53 38400 27.54 2 0.0000 mdbx 3.77 68856 15.16 2 0.0099 nf 11.69 112312 31.63 2 0.0174 protein 10.71 110048 46.78 2 0.0064 rnflow 10.95 163144 37.28 2 0.0268 test_fpu 10.08 150080 12.72 2 0.0314 tfft 1.37 30488 2.79 2 0.1074 Geometric Mean Execution Time = 18.20 seconds ================================================================================ Date & Time : 27 Dec 2007 22:44:36 Test Name : pbharness Compile Command : gfc %n.f90 -O3 -ffast-math -funroll-loops -finline-limit=600 --param min-vect-loop-bound=2 -o %n Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft Maximum Times : 300.0 Target Error % : 0.200 Minimum Repeats : 2 Maximum Repeats : 5 Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 4.48 46532 16.88 2 0.0207 aermod 104.92 1288460 37.09 2 0.0081 air 6.67 80956 11.36 5 0.0849 capacita 3.79 68332 62.40 2 0.0048 channel 2.65 50780 2.51 4 0.1828 doduc 14.27 183264 57.41 2 0.0009 fatigue 6.11 84564 14.02 2 0.0642 gas_dyn 5.93 699872 12.01 5 0.2754 induct 11.83 160132 73.59 2 0.0177 linpk 1.67 46512 27.57 2 0.0145 mdbx 3.84 72672 16.78 2 0.0149 nf 16.73 157220 31.86 2 0.0016 protein 11.62 113868 54.90 2 0.0337 rnflow 11.87 187316 45.56 2 0.0889 test_fpu 11.38 182544 14.56 2 0.0653 tfft 1.44 34420 3.03 5 0.2973 Geometric Mean Execution Time = 20.86 seconds ================================================================================ Polyhedron Benchmark Validator Copyright (C) Polyhedron Software Ltd - 2004 - All rights reserved The results have been obtain on an Intel Core2Duo 2.16Ghz with 2Gb of RAM under Darwin9.1 with gfortran 4.3 at revision 131206. Is this 10 to 20% speedup with -m64 expected? and how generic is it? In the assembly code of the inner loop of the test case in PR30388, the main differences I can see are at the level of the addressing: %eax, %ebp, ... in 32 bit mode and %rn, ... in 64 bit mode. TIA Dominique