http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- I'm trying to reproduce it. Can you on your side verify whether dropping -ftree-loop-linear changes anything with respect to the regression? Also what does (6) -Ofast -funroll-loops -fwhole-program numbers look like? Because if you factor in LTO then you should compare against a revision that includes 2013-04-26 Richard Biener <rguent...@suse.de> * Makefile.in (lto-streamer-in.o): Add $(CFGLOOP_H) dependency. (lto-streamer-out.o): Likewise. * cfgloop.c (init_loops_structure): Export, add struct function argument and adjust. (flow_loops_find): Adjust. * cfgloop.h (enum loop_estimation): Add EST_LAST. (init_loops_structure): Declare. * lto-streamer-in.c: Include cfgloop.h. (input_cfg): Input the loop tree. * lto-streamer-out.c: Include cfgloop.h. (output_cfg): Output the loop tree. (output_struct_function_base): Do not drop PROP_loops. I see (1) -Ofast -funroll-loops -fomit-frame-pointer -fwhole-program -flto (2) -Ofast -funroll-loops -fomit-frame-pointer -fwhole-program -flto -fprotect-parens revision: 198332 198333 (1) 15.5+-.3 15.6+-.2 (2) 16.1+-.1 15.9+-.2 note that the PAREN_EXPR thing made me point at the extra copyprop pass. So there is a difference between -f[no-]protect-parens but between the revs I cannot see a regression. Are you testing 64bit or 32bit executables? On Intel or PPC? As you noted the non-monotonic behavior wrt inlining decisions it would be interesting if those differ for you, (5) rev. 198332 vs. 198333. Add -fdump-ipa-inline to the command-line and inspect the aermod.f90.wpa.047i.inline dumpfile, grepping for 'Inlined into'. I only see changes in estimated time/size but no real code changes. I do see code layout changes though and changes in LTRANS due to the extra copyprop pass. Note that if -flto makes things worse compared to just -fwhole-program (which it slightly does for me) then this is probably due to partitioning. So you may also want to check -flto -flto-partition=none (slightly easier to debug in the end - but without LTO it would be easiest).