http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #3 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 
2011-11-15 12:19:59 UTC ---
(In reply to comment #1)
> I have a cunning plan.

It is doable to come within a factor of 2 of highly efficient implementations
using a cache-oblivious matrix multiply, which is relatively easy to code. I'm
not sure this is worth the effort.

I believe it would be more important to have actually highly efficient
(inlined) implementations for very small matrices. These would outperform
general libraries by a large factor. For CP2K I have written a specialized
small matrix multiply library generator which generates code that outperforms
e.g. MKL by a large factor for small matrices (<<32x32). The generation time
and library size do not make it a general purpose tool. It also contains an
implementation of the recursive multiply of some sort (see
http://cvs.berlios.de/cgi-bin/viewvc.cgi/cp2k/cp2k/tools/build_libsmm/)

Reply via email to