http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290
--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> --- There is a lot of noise in these numbers(?) Well, AFAICT aermod.f90 has a "non-monotonic" behavior for the different optimizations: when playing with --param max-inline-insns-auto=xx, the execution time was not decreasing for increasing xx, but went up or down depending on which routine was inlined. > the patch, apart from > > + * passes.c (init_optimization_passes): Schedule a copy-propagation > + pass before complete unrolling of inner loops. > > should have had no effect on performance (well, in theory, that is). > Can you check whether reverting the above part changes the results? Nope > Also, what's the variance of the numbers? Below 0.1s. > Are (1) to (4) effectively > the same performance r198332 vs. r198333? Yes for (2) and (4). For (1) and (3), I think the performances are slightly different. What triggered this PR is (5) (can you reproduce it?) versus (3), i.e., -fprotect-parens versus -fwhole-program -flto. > (make sure to disable > address-space randomization for benchmarking) I don't really know what you are talking about (I am using Darwin). Profiling the executable obtained with -fprotect-parens -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer -fwhole-program -flto gives - 21.8%, iblval_.lto_priv.516, a.out - 12.7%, sigz_.lto_priv.419, a.out - 12.7%, powf$fenv_access_off, libSystem.B.dylib 12.4%, anyavg_.constprop.50, a.out - 5.6%, plumef_.lto_priv.580, a.out and with -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer -fwhole-program -flto: - 14.7%, powf$fenv_access_off, libSystem.B.dylib - 14.5%, iblval_.lto_priv.284, a.out - 13.8%, sigz_.lto_priv.290, a.out 13.7%, anyavg_.constprop.50, a.out - 4.8%, refl_ht_.lto_priv.281, a.out - 4.7%, rmssig_.lto_priv.298, a.out 3.1%, _gfortran_compare_string, libgfortran.3.dylib The subroutine takes ~4.5s for the first set of options and ~2.6s for the second one.