On 06/15/2011 07:25 PM, Sturla Molden wrote: > Den 15.06.2011 23:22, skrev Christopher Barker: >> I think the issue got confused -- the OP was not looking to speed up a >> matrix multiply, but rather to speed up a whole bunch of independent >> matrix multiplies. > I would do it like this: > > 1. Write a Fortran function that make multiple calls DGEMM in a do loop. > (Or Fortran intrinsics dot_product or matmul.) > > 2. Put an OpenMP pragma around the loop (!$omp parallel do). Invoke the > OpenMP compiler on compilation. Use static or guided thread scheduling. > > 3. Call Fortran from Python using f2py, ctypes or Cython. > > Build with a thread-safe and single-threaded BLAS library. > > That should run as fast as it gets. > > Sturla > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion
The idea was to calculate: innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in general, multidimensional. Numpy has an inner product function called np.inner (http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a special case of tensordot. See the documentation for what is does and other examples. Also note that np.inner provides of the cross-products as well. For example, you can just do: >>> import numpy as np >>> a=np.arange(10).reshape(2,5) >>> a array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> b=a*10 >>> b array([[ 0, 10, 20, 30, 40], [50, 60, 70, 80, 90]]) >>> c=a*b >>> c array([[ 0, 10, 40, 90, 160], [250, 360, 490, 640, 810]]) >>> c.sum() 2850 >>>d=np.inner(a,b) >>>d array([[ 300, 800], [ 800, 2550]]) >>>np.diag(d).sum() 2850 I do not know if numpy's multiplication, np.inner() and np.sum() are single-threaded and thread-safe when you link to multi-threaded blas, lapack or altas libraries. But if everything is *single-threaded* and thread-safe, then you just create a function and use Anne's very useful handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise you would have to determine the best approach such doing nothing (especially if the arrays are huge), or making the functions single-thread. By the way, if the arrays are sufficiently small, there is a lot of overhead involved such that there is more time in communication than computation. So you have to determine the best algorithm for you case. Bruce _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
