http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810

--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-09-29 
20:27:36 UTC ---
(In reply to comment #8)
> Using -fno-inline-functions, the program recovers the speed of the no-LTO
> version.

This is weird!-( I have done the following profiling and it shows that -flto
prevents the inlining of __perdida_m_MOD_perdida, while -fno-inline-functions
restores it. This contradicts what the manual says:

-finline-functions
Integrate all simple functions into their callers. The compiler heuristically
decides which functions are simple enough to be worth integrating in this way.

Note also that in order to inline __perdida_m_MOD_generalized_hookes_law one
needs -finline-limit=600 (actually some number between 300 and 400).


[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g fatigue.f90
[macbook] lin/test% time a.out > /dev/null
6.547u 0.024s 0:06.57 99.8%    0+0k 0+2io 0pf+0w

+ 70.8%, MAIN__, a.out
| + 10.1%, free, libSystem.B.dylib
| |   7.9%, szone_size, libSystem.B.dylib
| + 8.0%, malloc, libSystem.B.dylib
| | + 6.4%, malloc_zone_malloc, libSystem.B.dylib
| | |   4.4%, szone_malloc_should_clear, libSystem.B.dylib
| | |   0.4%, szone_malloc, libSystem.B.dylib
| |   0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
| |   0.1%, szone_malloc_should_clear, libSystem.B.dylib
|   4.1%, szone_free_definite_size, libSystem.B.dylib
|   2.4%, cosisin, libSystem.B.dylib
| + 0.7%, cexp, libSystem.B.dylib
| |   0.1%, exp$fenv_access_off, libSystem.B.dylib
| |   0.0%, dyld_stub_exp, libSystem.B.dylib
  27.2%, __perdida_m_MOD_generalized_hookes_law, a.out
  0.5%, dyld_stub_malloc, a.out
  0.4%, free, libSystem.B.dylib
  0.4%, dyld_stub_free, a.out
  0.4%, szone_free_definite_size, libSystem.B.dylib
  0.2%, malloc, libSystem.B.dylib
  0.1%, dyld_stub_cexp, a.out
  0.0%, cexp, libSystem.B.dylib

[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
9.013u 0.027s 0:09.04 99.8%    0+0k 0+2io 0pf+0w

+ 64.8%, __perdida_m_MOD_perdida, a.out                                 
<-------
| + 6.8%, free, libSystem.B.dylib
| |   4.9%, szone_size, libSystem.B.dylib
| + 5.2%, malloc, libSystem.B.dylib
| | + 4.1%, malloc_zone_malloc, libSystem.B.dylib
| | |   2.5%, szone_malloc_should_clear, libSystem.B.dylib
| | |   0.5%, szone_malloc, libSystem.B.dylib
| |   0.3%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
|   3.1%, szone_free_definite_size, libSystem.B.dylib
  19.3%, __perdida_m_MOD_generalized_hookes_law, a.out
+ 14.6%, MAIN__.2130, a.out
|   1.8%, cosisin, libSystem.B.dylib
| + 0.4%, cexp, libSystem.B.dylib
| |   0.1%, exp$fenv_access_off, libSystem.B.dylib
| |   0.0%, dyld_stub_exp, libSystem.B.dylib
| |   0.0%, cosisin, libSystem.B.dylib
  0.3%, szone_free_definite_size, libSystem.B.dylib
  0.3%, dyld_stub_malloc, a.out
  0.3%, dyld_stub_free, a.out
  0.2%, free, libSystem.B.dylib
  0.2%, malloc, libSystem.B.dylib
  0.0%, cexp, libSystem.B.dylib
  0.0%, data_transfer_init, libgfortran.3.dylib

[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto
-fno-inline-functions fatigue.f90
[macbook] lin/test% time a.out > /dev/null
6.575u 0.021s 0:06.61 99.6%    0+0k 0+2io 0pf+0w

+ 71.0%, MAIN__.2130, a.out
| + 8.9%, free, libSystem.B.dylib
| |   6.6%, szone_size, libSystem.B.dylib
| + 8.1%, malloc, libSystem.B.dylib
| | + 6.4%, malloc_zone_malloc, libSystem.B.dylib
| | |   4.5%, szone_malloc_should_clear, libSystem.B.dylib
| | |   0.6%, szone_malloc, libSystem.B.dylib
| |   0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
| |   0.2%, szone_malloc_should_clear, libSystem.B.dylib
|   4.4%, szone_free_definite_size, libSystem.B.dylib
|   1.9%, cosisin, libSystem.B.dylib
| + 1.0%, cexp, libSystem.B.dylib
| |   0.1%, exp$fenv_access_off, libSystem.B.dylib
| |   0.1%, cosisin, libSystem.B.dylib
| |   0.0%, dyld_stub_exp, libSystem.B.dylib
  27.3%, __perdida_m_MOD_generalized_hookes_law, a.out
  0.4%, free, libSystem.B.dylib
  0.3%, dyld_stub_malloc, a.out
  0.3%, dyld_stub_free, a.out
  0.3%, szone_free_definite_size, libSystem.B.dylib
  0.2%, malloc, libSystem.B.dylib
  0.1%, dyld_stub_cexp, a.out
  0.0%, cexp, libSystem.B.dylib

[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto
-finline-limit=600 fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.768u 0.018s 0:04.79 99.5%    0+0k 0+1io 0pf+0w

+ 97.5%, MAIN__.2133, a.out
| + 15.4%, free, libSystem.B.dylib
| |   10.6%, szone_size, libSystem.B.dylib
| + 11.4%, malloc, libSystem.B.dylib
| | + 9.6%, malloc_zone_malloc, libSystem.B.dylib
| | |   4.9%, szone_malloc_should_clear, libSystem.B.dylib
| | |   0.9%, szone_malloc, libSystem.B.dylib
| |   0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
|   6.4%, szone_free_definite_size, libSystem.B.dylib
|   2.7%, cosisin, libSystem.B.dylib
| + 0.8%, cexp, libSystem.B.dylib
| |   0.1%, exp$fenv_access_off, libSystem.B.dylib
| |   0.1%, cosisin, libSystem.B.dylib
| |   0.0%, dyld_stub_exp, libSystem.B.dylib
  0.5%, szone_free_definite_size, libSystem.B.dylib
  0.5%, dyld_stub_malloc, a.out
  0.5%, dyld_stub_free, a.out
  0.4%, free, libSystem.B.dylib
  0.4%, malloc, libSystem.B.dylib
  0.1%, dyld_stub_cexp, a.out

Reply via email to