https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70046

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I've confirmed the regression to be caused by r230647

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves      13590        180       75.7 *   13590        198       68.6 *  

with BASE on r230646 and PEAK on r230647 using -Ofast -march=haswell on a
Intel(R) Core(TM) i5-4670T

I can even reproduce the difference w/o any -march thus with just -Ofast:

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves      13590        176       77.1 *   13590        199       68.5 *  


As expected the difference is in mat_times_vec_

Samples: 1M of event 'cycles', Event count (approx.): 1280690858409             
 39.22%  bwaves_peak.amd  bwaves_peak.amd64-m64-gcc42-nn  [.] mat_times_vec_
 33.60%  bwaves_base.amd  bwaves_base.amd64-m64-gcc42-nn  [.] mat_times_vec_


IV differences are a mixed bag but number of IVs are different for the
nest, the slower case having much more IVs in the inner loop.

Reply via email to