https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #23 from Andrew Roberts <andrewm.roberts at sky dot com> ---
Thanks Honza,
getting closer, with original matrix.c on Ryzen:
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix
mult took 364850 clocks
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-O3 matrix.c -o matrix
mult took 194517 clocks
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -O3 matrix.c -o matrix
mult took 130343 clocks
/usr/local/gcc/bin/gcc -march=haswell -mtune=haswell -mprefer-vector-width=none
-mno-fma -O3 matrix.c -o matrix
mult took 130129 clocks
These last two are comparable with the fastest obtained from trying all
combinations of -march and -mtune