------- Comment #28 from dominiq at lps dot ens dot fr 2009-08-31 23:59 ------- Following Richard Guenther's suggestion on IRC, I have tested the following patch:
--- ../_gcc_clean/gcc/builtins.c 2009-08-31 15:07:18.000000000 +0200 +++ gcc/builtins.c 2009-09-01 01:28:09.000000000 +0200 @@ -3012,7 +3012,7 @@ real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0); if (real_identical (&c2, &cint) && ((flag_unsafe_math_optimizations - && optimize_insn_for_speed_p () + /* && optimize_insn_for_speed_p () */ && powi_cost (n/2) <= POWI_MAX_MULTS) || n == 1)) { With it I get: [ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90 [ibook-dhum] test/dbg_air% time a.out > /dev/null 4.490u 0.018s 0:04.51 99.7% 0+0k 0+3io 0pf+0w compared to [ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations -fno-strict-overflow air_db.f90 [ibook-dhum] test/dbg_air% time a.out > /dev/null 4.320u 0.015s 0:04.34 99.7% 0+0k 0+0io 0pf+0w and there is no call to pow in the assembly. I think the difference is significant; so it seems that optimize_insn_for_speed_p () is playing some role elsewhere in the code. Note that if I replace lines 322 and 427 mu = mu0*(T(i,j)/t02)**1.5*(t02+110.56)/(T(i,j)+110.56) with mu = mu0*sqrt((T(i,j)/t02)**3)*(t02+110.56)/(T(i,j)+110.56) or mu = mu0*sqrt((T(i,j)/t02))*(T(i,j)/t02)*(t02+110.56)/(T(i,j)+110.56) there is no call to pow and the code is slightly faster with -fno-strict-overflow [ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations -fno-strict-overflow air_db_1.f90 [ibook-dhum] test/dbg_air% time a.out > /dev/null 4.323u 0.015s 0:04.34 99.7% 0+0k 0+0io 0pf+0w [ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations air_db_1.f90 [ibook-dhum] test/dbg_air% time a.out > /dev/null 4.527u 0.016s 0:04.55 99.5% 0+0k 0+0io 0pf+0w The original air.f90 compiled with -fwhole-file gives [ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops -ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param min-vect-loop-bound=2 -fwhole-file air.f90 [ibook-dhum] lin/test% time a.out > /dev/null 8.358u 0.049s 0:08.42 99.6% 0+0k 0+8io 0pf+0w compared to [ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops -ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param min-vect-loop-bound=2 air.f90 [[ibook-dhum] lin/test% time a.out > /dev/null 8.273u 0.046s 0:08.32 99.8% 0+0k 0+0io 0pf+0w -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106