https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
--- Comment #22 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> --- (In reply to Thomas Koenig from comment #21) > I assume that for small matrices bordering on the silly > (say, a matrix multiplication with dimensions of (1,2) and (2,1)) > the inline code will be faster if the code is compiled with the > right options, due to function call overhead. I also assume that > libxsmm will become faster quite soon for bigger sizes. > > Do you have an idea where the crossover is? I agree that inline should be faster, if the compiler is reasonably smart, if the matrix dimensions are known at compile time (i.e. should be able to generate the same kernel). I haven't checked yet.