Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Sturla Molden Wed, 15 Jun 2011 17:26:10 -0700

Den 15.06.2011 23:22, skrev Christopher Barker:
>
> I think the issue got confused -- the OP was not looking to speed up a
> matrix multiply, but rather to speed up a whole bunch of independent
> matrix multiplies.


I would do it like this:

1. Write a Fortran function that make multiple calls DGEMM in a do loop. 
(Or Fortran intrinsics dot_product or matmul.)

2. Put an OpenMP pragma around the loop  (!$omp parallel do). Invoke the 
OpenMP compiler on compilation. Use static or guided thread scheduling.

3. Call Fortran from Python using f2py, ctypes or Cython.

Build with a thread-safe and single-threaded BLAS library.

That should run as fast as it gets.

Sturla
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Reply via email to