https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #25 from Jan Hubicka <hubicka at ucw dot cz> --- Hi, I agree that the matric multiplication fma issue is important and hopefully it will be fixed for GCC 8. See https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00437.html The irregularity of tune/arch is probably originating from enabling/disabling fma and avx256 preferrence. I get jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=native -mno-fma mult.c jh@d136:~> ./a.out mult took 193593 clocks jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=native -mno-fma -mprefer-vector-width=256 mult.c jh@d136:~> ./a.out mult took 104745 clocks jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=haswell -mprefer-vector-width=256 mult.c jh@d136:~> ./a.out mult took 160123 clocks jh@d136:~> /home/jh/trunk-install-new3/bin/gcc -Ofast -march=haswell -mprefer-vector-width=256 -mno-fma mult.c jh@d136:~> ./a.out mult took 102048 clocks 90% difference on a common loop is quite noticeable. Continuing my benchmarkings on spec2k. This is -Ofast -march=native -mprefer-vector-width=none compared to -Ofast -march=native -mtune=haswell -mprefer-vector-width=128. So neither of those are win compared to -mtune=native. 164.gzip 1400 58.2 2407 * 1400 57.9 2419 * 175.vpr 1400 37.5 3731 * 1400 37.8 3704 * 176.gcc 1100 20.0 5494 * 1100 20.0 5497 * 181.mcf 1800 21.6 8324 * 1800 20.8 8660 * 186.crafty 1000 20.9 4790 * 1000 21.2 4722 * 197.parser 1800 51.4 3499 * 1800 51.8 3472 * 252.eon 1300 19.3 6749 * 1300 18.2 7143 * 253.perlbmk X X 254.gap X X 255.vortex X X 256.bzip2 1500 43.1 3483 * 1500 43.5 3444 * 300.twolf 3000 56.6 5302 * 3000 57.0 5267 * Est. SPECint_base2000 4563 Est. SPECint2000 4591 168.wupwise 1600 30.9 5179 * 1600 29.7 5387 * 171.swim 3100 27.4 11309 * 3100 26.4 11739 * 172.mgrid 1800 31.0 5814 * 1800 26.1 6895 * 173.applu 2100 25.7 8175 * 2100 25.9 8096 * 177.mesa 1400 23.3 6006 * 1400 23.3 6001 * 178.galgel X X 179.art 2600 11.0 23702 * 2600 11.0 23718 * 183.equake 1300 13.0 10033 * 1300 13.1 9944 * 187.facerec 1900 24.0 7931 * 1900 17.2 11040 * 188.ammp 2200 34.4 6394 * 2200 35.2 6249 * 189.lucas 2000 20.3 9864 * 2000 20.8 9603 * 191.fma3d 2100 31.4 6686 * 2100 30.0 7011 * 200.sixtrack 1100 41.7 2641 * 1100 38.5 2856 * 301.apsi 2600 34.1 7630 * 2600 34.2 7612 * Est. SPECfp_base2000 7570 Est. SPECfp2000 7947