https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66189

--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Dominique d'Humieres from comment #1)
> IMO the matmul inlining should be restricted to small matrices, thus I am
> not convinced that this worth the work.

For large matrix sizes, an external optimized BLAS is faster.  This is why
inline matmul hands over to the external BLAS by default.

Our current library implementation is slower than inline matmul, so if the user
does not use -fexternal-blas, inlining still makes sense, and it also makes
sense to make this fast.

Reply via email to