------- Comment #28 from dominiq at lps dot ens dot fr  2009-08-31 23:59 -------
Following Richard Guenther's suggestion on IRC, I have tested the following
patch:

--- ../_gcc_clean/gcc/builtins.c        2009-08-31 15:07:18.000000000 +0200
+++ gcc/builtins.c      2009-09-01 01:28:09.000000000 +0200
@@ -3012,7 +3012,7 @@
       real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
       if (real_identical (&c2, &cint)
          && ((flag_unsafe_math_optimizations
-              && optimize_insn_for_speed_p ()
+              /* && optimize_insn_for_speed_p () */
               && powi_cost (n/2) <= POWI_MAX_MULTS)
              || n == 1))
        {

With it I get:

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.490u 0.018s 0:04.51 99.7%     0+0k 0+3io 0pf+0w

compared to

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.320u 0.015s 0:04.34 99.7%     0+0k 0+0io 0pf+0w

and there is no call to pow in the assembly. I think the difference is
significant; so it seems that optimize_insn_for_speed_p () is playing some role
elsewhere in the code. Note that if I replace lines 322 and 427

            mu = mu0*(T(i,j)/t02)**1.5*(t02+110.56)/(T(i,j)+110.56)

with

            mu = mu0*sqrt((T(i,j)/t02)**3)*(t02+110.56)/(T(i,j)+110.56)

or

            mu =
mu0*sqrt((T(i,j)/t02))*(T(i,j)/t02)*(t02+110.56)/(T(i,j)+110.56)

there is no call to pow and the code is slightly faster with
-fno-strict-overflow

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.323u 0.015s 0:04.34 99.7%     0+0k 0+0io 0pf+0w
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.527u 0.016s 0:04.55 99.5%     0+0k 0+0io 0pf+0w

The original air.f90 compiled with -fwhole-file gives

[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 -fwhole-file air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.358u 0.049s 0:08.42 99.6%     0+0k 0+8io 0pf+0w

compared to

[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 air.f90
[[ibook-dhum] lin/test% time a.out > /dev/null
8.273u 0.046s 0:08.32 99.8%     0+0k 0+0io 0pf+0w


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106

Reply via email to