Den 15.06.2011 23:22, skrev Christopher Barker:
>
> I think the issue got confused -- the OP was not looking to speed up a
> matrix multiply, but rather to speed up a whole bunch of independent
> matrix multiplies.

I would do it like this:

1. Write a Fortran function that make multiple calls DGEMM in a do loop. 
(Or Fortran intrinsics dot_product or matmul.)

2. Put an OpenMP pragma around the loop  (!$omp parallel do). Invoke the 
OpenMP compiler on compilation. Use static or guided thread scheduling.

3. Call Fortran from Python using f2py, ctypes or Cython.

Build with a thread-safe and single-threaded BLAS library.

That should run as fast as it gets.

Sturla
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to