https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Currently, we only inline statements of the form a = matmul(b,c) so the more complex expressions in your code are not inlined (and thus slow). This is a known limitation, which will not be fixed in time for gcc 7. Maybe 8... If you want to use matmul, you would need to insert temporaries by hand. Also make sure to add flags which allow reassociation (such as -Ofast); otherwise the optimizer might not work well.