https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- If dot_product (matmul (...), ..) can be implemented more optimally (is there a blas/lapack primitive for it?) then the best course of action is to pattern match that inside the frontend and emit a library call to an optimized routine (which means eventually adding one to libfortran or using/extending -fexternal-blas. Recovering from this in the middle-end is only possible if both primitives are inlined and even then I expect it to be quite difficult to get optimal code out of it (though it's certainly interesting to see if we're at least getting a useful idea of data dependence). Long-term exposing important primitives semantics to the middle-end, even when implemented as library calls would be interesting (aka, add __builtin_dot_product, etc. which would make it possible to delay inline-expanding as well).