------- Comment #28 from dominiq at lps dot ens dot fr 2009-08-31 23:59 -------
Following Richard Guenther's suggestion on IRC, I have tested the following
patch:
--- ../_gcc_clean/gcc/builtins.c 2009-08-31 15:07:18.000000000 +0200
+++ gcc/builtins.c 2009-09-01 01:28:09.000000000 +0200
@@ -3012,7 +3012,7 @@
real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
if (real_identical (&c2, &cint)
&& ((flag_unsafe_math_optimizations
- && optimize_insn_for_speed_p ()
+ /* && optimize_insn_for_speed_p () */
&& powi_cost (n/2) <= POWI_MAX_MULTS)
|| n == 1))
{
With it I get:
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.490u 0.018s 0:04.51 99.7% 0+0k 0+3io 0pf+0w
compared to
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.320u 0.015s 0:04.34 99.7% 0+0k 0+0io 0pf+0w
and there is no call to pow in the assembly. I think the difference is
significant; so it seems that optimize_insn_for_speed_p () is playing some role
elsewhere in the code. Note that if I replace lines 322 and 427
mu = mu0*(T(i,j)/t02)**1.5*(t02+110.56)/(T(i,j)+110.56)
with
mu = mu0*sqrt((T(i,j)/t02)**3)*(t02+110.56)/(T(i,j)+110.56)
or
mu =
mu0*sqrt((T(i,j)/t02))*(T(i,j)/t02)*(t02+110.56)/(T(i,j)+110.56)
there is no call to pow and the code is slightly faster with
-fno-strict-overflow
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.323u 0.015s 0:04.34 99.7% 0+0k 0+0io 0pf+0w
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.527u 0.016s 0:04.55 99.5% 0+0k 0+0io 0pf+0w
The original air.f90 compiled with -fwhole-file gives
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 -fwhole-file air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.358u 0.049s 0:08.42 99.6% 0+0k 0+8io 0pf+0w
compared to
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 air.f90
[[ibook-dhum] lin/test% time a.out > /dev/null
8.273u 0.046s 0:08.32 99.8% 0+0k 0+0io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106