https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2022-08-09
Known to fail| |12.1.0
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization
Ever confirmed|0 |1
Version|unknown |10.3.0
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed also with gfortran 12. The issue is that with the combined
matmul+transpose we invoke matmul with an array descriptor representing the
transpose operation which results in suboptimal memory access patterns.
Can you check whether ifort does the transpose separately or whether its
matmul library routine simply special-cases the situation?