https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70046
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- I've confirmed the regression to be caused by r230647 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 180 75.7 * 13590 198 68.6 * with BASE on r230646 and PEAK on r230647 using -Ofast -march=haswell on a Intel(R) Core(TM) i5-4670T I can even reproduce the difference w/o any -march thus with just -Ofast: Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 176 77.1 * 13590 199 68.5 * As expected the difference is in mat_times_vec_ Samples: 1M of event 'cycles', Event count (approx.): 1280690858409 39.22% bwaves_peak.amd bwaves_peak.amd64-m64-gcc42-nn [.] mat_times_vec_ 33.60% bwaves_base.amd bwaves_base.amd64-m64-gcc42-nn [.] mat_times_vec_ IV differences are a mixed bag but number of IVs are different for the nest, the slower case having much more IVs in the inner loop.