Hi all, Thanks for your replies. > Brandt Belson wrote: > > Unfortunately I can't flatten the arrays. I'm writing a library where > > the user supplies an inner product function for two generic objects, and > > almost always the inner product function does large array > > multiplications at some point. The library doesn't get to know about the > > underlying arrays. > > Now I'm confused -- if the user is providing the inner product > implementation, how can you optimize that? Or are you trying to provide > said user with an optimized "large array multiplication" that he/she can > use?
I'm sorry if I wasn't clear. I'm not providing a new array multiplication function. I'm taking the inner product function (which usually contains numpy array multiplication) from the user as a given. I am parallelizing the process of performing *many* inner products so that each core can do them independently. The parallelization is in performing many individual inner products, not within each inner product/array multiplication. > If so, then I'd post your implementation here, and folks can suggest > improvements. I did attach some code showing what I'm doing but that was a few days ago so I'll attach it again. > If it's regular old element-wise multiplication: > > a*b > > (where a and b are numpy arrays) > > then you are right, numpy isn't using any fancy multi-core aware > optimized package, so you should be able to make a faster version. > > You might try numexpr also -- it's pretty cool, though may not help for > a single operation. It might give you some ideas, though. > > http://www.scipy.org/SciPyPackages/NumExpr > > > -Chris NumExpr looks helpful and I'll definitely look into it, but the main issue is parallelizing many element-wise array multiplications, not speeding-up the array multiplication operation. It might be that parallelizing the individual inner products among cores isn't the right approach, but I'm not sure it's wrong yet. > > Message: 2 > > Date: Fri, 10 Jun 2011 09:23:10 -0400 > > From: Olivier Delalleau <[email protected] <mailto:[email protected]>> > > Subject: Re: [Numpy-discussion] Using multiprocessing (shared memory) > > with numpy array multiplication > > To: Discussion of Numerical Python <[email protected] > > <mailto:[email protected]>> > > Message-ID: <[email protected] > > <mailto:banlktikjppc90ye56t1mr%[email protected]>> > > Content-Type: text/plain; charset="iso-8859-1" > > > > It may not work for you depending on your specific problem > > constraints, but > > if you could flatten the arrays, then it would be a dot, and you > > could maybe > > compute multiple such dot products by storing those flattened arrays > > into a > > matrix. > > > > -=- Olivier > > > > 2011/6/10 Brandt Belson <[email protected] > > <mailto:[email protected]>> > > > > > Hi, > > > Thanks for getting back to me. > > > I'm doing element wise multiplication, basically innerProduct = > > > numpy.sum(array1*array2) where array1 and array2 are, in general, > > > multidimensional. I need to do many of these operations, and I'd > > like to > > > split up the tasks between the different cores. I'm not using > > numpy.dot, if > > > I'm not mistaken I don't think that would do what I need. > > > Thanks again, > > > Brandt > > > > > > > > > Message: 1 > > >> Date: Thu, 09 Jun 2011 13:11:40 -0700 > > >> From: Christopher Barker <[email protected] > > <mailto:[email protected]>> > > >> Subject: Re: [Numpy-discussion] Using multiprocessing (shared > > memory) > > >> with numpy array multiplication > > >> To: Discussion of Numerical Python <[email protected] > > <mailto:[email protected]>> > > >> Message-ID: <[email protected] > > <mailto:[email protected]>> > > >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > >> > > >> Not much time, here, but since you got no replies earlier: > > >> > > >> > > >> > > I'm parallelizing some code I've written using the built in > > >> > multiprocessing > > >> > > module. In my application, I need to multiply many > > large arrays > > >> > together > > >> > > >> is the matrix multiplication, or element-wise? If matrix, then numpy > > >> should be using LAPACK, which, depending on how its built, could be > > >> using all your cores already. This is heavily dependent on your your > > >> numpy (really the LAPACK it uses0 is built. > > >> > > >> > > and > > >> > > sum the resulting product arrays (inner products). > > >> > > >> are you using numpy.dot() for that? If so, then the above applies to > > >> that as well. > > >> > > >> I know I could look at your code to answer these questions, but I > > >> thought this might help. > > >> > > >> -Chris > > >> > > >> > > >> > > >> > > >> > > >> -- > > >> Christopher Barker, Ph.D. > > >> Oceanographer > > >> > > >> Emergency Response Division > > >> NOAA/NOS/OR&R (206) 526-6959 > > <tel:%28206%29%20526-6959> voice > > >> 7600 Sand Point Way NE (206) 526-6329 > > <tel:%28206%29%20526-6329> fax > > >> Seattle, WA 98115 (206) 526-6317 > > <tel:%28206%29%20526-6317> main reception > > >> > > >> [email protected] <mailto:[email protected]> > Message: 2 > Date: Mon, 13 Jun 2011 12:51:08 -0500 > From: srean <[email protected]> > Subject: Re: [Numpy-discussion] Using multiprocessing (shared memory) > with numpy array multiplication > To: Discussion of Numerical Python <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > Looking at the code the arrays that you are multiplying seem fairly > small (300, 200) and you have 50 of them. So it might the case that > there is not enough computational work to compensate for the cost of > forking new processes and communicating the results. Have you tried > larger arrays and more of them ? I've tried varying the sizes and the trends are consistent - using multiprocessing on numpy array multiplication is slower than not using it. For reference, I'm on a mac with the following numpy configuration: >>> print numpy.show_config() lapack_opt_info: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] extra_compile_args = ['-faltivec'] define_macros = [('NO_ATLAS_INFO', 3)] blas_opt_info: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] extra_compile_args = ['-faltivec', '-I/System/Library/Frameworks/vecLib.framework/Headers'] define_macros = [('NO_ATLAS_INFO', 3)] None > If you are on an intel machine and you have MKL libraries around I > would strongly recommend that you use the matrix multiplication > routine if possible. MKL will do the parallelization for you. Well, > any good BLAS implementation would do the same, you dont really need > MKL. ATLAS and ACML would work too, just that MKL has been setup for > us and it works well. > > To give an idea, given the amount of tuning and optimization that > these libraries have undergone a numpy.sum would be slower that an > multiplication with a vector of all ones. So in the interest of speed > the longer you stay in the BLAS context the better. > > --srean That seems like a good option. While I'd like the user to have minimal restrictions and dependencies to consider when writing the inner product function, maybe I should put the burden on them to parallelize the inner products, which could be simply done by configuring numpy with MKL I guess (I haven't tried this yet). I'm still a bit curious what is causing my script to be slower when the multiple inner products are parallelized. Thanks, Brandt
myutil.py
Description: Binary data
shared_mem.py
Description: Binary data
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
