http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-09-29 20:27:36 UTC --- (In reply to comment #8) > Using -fno-inline-functions, the program recovers the speed of the no-LTO > version. This is weird!-( I have done the following profiling and it shows that -flto prevents the inlining of __perdida_m_MOD_perdida, while -fno-inline-functions restores it. This contradicts what the manual says: -finline-functions Integrate all simple functions into their callers. The compiler heuristically decides which functions are simple enough to be worth integrating in this way. Note also that in order to inline __perdida_m_MOD_generalized_hookes_law one needs -finline-limit=600 (actually some number between 300 and 400). [macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g fatigue.f90 [macbook] lin/test% time a.out > /dev/null 6.547u 0.024s 0:06.57 99.8% 0+0k 0+2io 0pf+0w + 70.8%, MAIN__, a.out | + 10.1%, free, libSystem.B.dylib | | 7.9%, szone_size, libSystem.B.dylib | + 8.0%, malloc, libSystem.B.dylib | | + 6.4%, malloc_zone_malloc, libSystem.B.dylib | | | 4.4%, szone_malloc_should_clear, libSystem.B.dylib | | | 0.4%, szone_malloc, libSystem.B.dylib | | 0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib | | 0.1%, szone_malloc_should_clear, libSystem.B.dylib | 4.1%, szone_free_definite_size, libSystem.B.dylib | 2.4%, cosisin, libSystem.B.dylib | + 0.7%, cexp, libSystem.B.dylib | | 0.1%, exp$fenv_access_off, libSystem.B.dylib | | 0.0%, dyld_stub_exp, libSystem.B.dylib 27.2%, __perdida_m_MOD_generalized_hookes_law, a.out 0.5%, dyld_stub_malloc, a.out 0.4%, free, libSystem.B.dylib 0.4%, dyld_stub_free, a.out 0.4%, szone_free_definite_size, libSystem.B.dylib 0.2%, malloc, libSystem.B.dylib 0.1%, dyld_stub_cexp, a.out 0.0%, cexp, libSystem.B.dylib [macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto fatigue.f90 [macbook] lin/test% time a.out > /dev/null 9.013u 0.027s 0:09.04 99.8% 0+0k 0+2io 0pf+0w + 64.8%, __perdida_m_MOD_perdida, a.out <------- | + 6.8%, free, libSystem.B.dylib | | 4.9%, szone_size, libSystem.B.dylib | + 5.2%, malloc, libSystem.B.dylib | | + 4.1%, malloc_zone_malloc, libSystem.B.dylib | | | 2.5%, szone_malloc_should_clear, libSystem.B.dylib | | | 0.5%, szone_malloc, libSystem.B.dylib | | 0.3%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib | 3.1%, szone_free_definite_size, libSystem.B.dylib 19.3%, __perdida_m_MOD_generalized_hookes_law, a.out + 14.6%, MAIN__.2130, a.out | 1.8%, cosisin, libSystem.B.dylib | + 0.4%, cexp, libSystem.B.dylib | | 0.1%, exp$fenv_access_off, libSystem.B.dylib | | 0.0%, dyld_stub_exp, libSystem.B.dylib | | 0.0%, cosisin, libSystem.B.dylib 0.3%, szone_free_definite_size, libSystem.B.dylib 0.3%, dyld_stub_malloc, a.out 0.3%, dyld_stub_free, a.out 0.2%, free, libSystem.B.dylib 0.2%, malloc, libSystem.B.dylib 0.0%, cexp, libSystem.B.dylib 0.0%, data_transfer_init, libgfortran.3.dylib [macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto -fno-inline-functions fatigue.f90 [macbook] lin/test% time a.out > /dev/null 6.575u 0.021s 0:06.61 99.6% 0+0k 0+2io 0pf+0w + 71.0%, MAIN__.2130, a.out | + 8.9%, free, libSystem.B.dylib | | 6.6%, szone_size, libSystem.B.dylib | + 8.1%, malloc, libSystem.B.dylib | | + 6.4%, malloc_zone_malloc, libSystem.B.dylib | | | 4.5%, szone_malloc_should_clear, libSystem.B.dylib | | | 0.6%, szone_malloc, libSystem.B.dylib | | 0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib | | 0.2%, szone_malloc_should_clear, libSystem.B.dylib | 4.4%, szone_free_definite_size, libSystem.B.dylib | 1.9%, cosisin, libSystem.B.dylib | + 1.0%, cexp, libSystem.B.dylib | | 0.1%, exp$fenv_access_off, libSystem.B.dylib | | 0.1%, cosisin, libSystem.B.dylib | | 0.0%, dyld_stub_exp, libSystem.B.dylib 27.3%, __perdida_m_MOD_generalized_hookes_law, a.out 0.4%, free, libSystem.B.dylib 0.3%, dyld_stub_malloc, a.out 0.3%, dyld_stub_free, a.out 0.3%, szone_free_definite_size, libSystem.B.dylib 0.2%, malloc, libSystem.B.dylib 0.1%, dyld_stub_cexp, a.out 0.0%, cexp, libSystem.B.dylib [macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto -finline-limit=600 fatigue.f90 [macbook] lin/test% time a.out > /dev/null 4.768u 0.018s 0:04.79 99.5% 0+0k 0+1io 0pf+0w + 97.5%, MAIN__.2133, a.out | + 15.4%, free, libSystem.B.dylib | | 10.6%, szone_size, libSystem.B.dylib | + 11.4%, malloc, libSystem.B.dylib | | + 9.6%, malloc_zone_malloc, libSystem.B.dylib | | | 4.9%, szone_malloc_should_clear, libSystem.B.dylib | | | 0.9%, szone_malloc, libSystem.B.dylib | | 0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib | 6.4%, szone_free_definite_size, libSystem.B.dylib | 2.7%, cosisin, libSystem.B.dylib | + 0.8%, cexp, libSystem.B.dylib | | 0.1%, exp$fenv_access_off, libSystem.B.dylib | | 0.1%, cosisin, libSystem.B.dylib | | 0.0%, dyld_stub_exp, libSystem.B.dylib 0.5%, szone_free_definite_size, libSystem.B.dylib 0.5%, dyld_stub_malloc, a.out 0.5%, dyld_stub_free, a.out 0.4%, free, libSystem.B.dylib 0.4%, malloc, libSystem.B.dylib 0.1%, dyld_stub_cexp, a.out