https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #23 from Andrew Roberts <andrewm.roberts at sky dot com> --- Thanks Honza, getting closer, with original matrix.c on Ryzen: /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix mult took 364850 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -O3 matrix.c -o matrix mult took 194517 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -O3 matrix.c -o matrix mult took 130343 clocks /usr/local/gcc/bin/gcc -march=haswell -mtune=haswell -mprefer-vector-width=none -mno-fma -O3 matrix.c -o matrix mult took 130129 clocks These last two are comparable with the fastest obtained from trying all combinations of -march and -mtune