Den 15.06.2011 23:22, skrev Christopher Barker: > > I think the issue got confused -- the OP was not looking to speed up a > matrix multiply, but rather to speed up a whole bunch of independent > matrix multiplies.
I would do it like this: 1. Write a Fortran function that make multiple calls DGEMM in a do loop. (Or Fortran intrinsics dot_product or matmul.) 2. Put an OpenMP pragma around the loop (!$omp parallel do). Invoke the OpenMP compiler on compilation. Use static or guided thread scheduling. 3. Call Fortran from Python using f2py, ctypes or Cython. Build with a thread-safe and single-threaded BLAS library. That should run as fast as it gets. Sturla _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
