Re: [Numpy-discussion] timing results (was: record arrays initialization)

Moroney, Catherine M (388D) Thu, 03 May 2012 10:38:58 -0700

On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote:

> A quick recap of the problem:  a 128x512 array of 7-element vectors 
> (element), and a 5000-vector
> training dataset (targets).  For each vector in element, I want to find the 
> best-match in targets,
> defined as minimizing the Euclidean distance.
> 
> I coded it up three ways: (a) looping through each vector in element 
> individually, (b) vectorizing
> the function in the previous step, and coding it up in Fortran.  The heart of 
> the "find-best-match"
> code in Python looks like so I'm not doing an individual loop through all 
> 5000 vectors in targets:
> 
>    nlen = xelement.shape[0]
>    nvec = targets.data.shape[0]
>    x = xelement.reshape(1, nlen).repeat(nvec, axis=0)
> 
>    diffs = ((x - targets.data)**2).sum(axis=1)
>    diffs = numpy.sqrt(diffs)
>    return int(numpy.argmin(diffs, axis=0))
> 
> Here are the results:
> 
> (a) looping through each vector:  68 seconds
> (b) vectorizing this:             58 seconds
> (c) raw Fortran with loops:       26 seconds
> 
> I was surprised to see that vectorizing didn't gain me that much time, and 
> that the Fortran
> was so much faster than both python alternatives.  So, there's a lot that I 
> don't know about
> how the internals of numpy and python work.
> 
> Why does the loop through 128x512 elements in python only take an additional 
> 10 seconds?  What
> is the main purpose of vectorizing - is it optimization by taking the looping 
> step out of the
> Python and into the C-base or something different?
> 
> And, why is the Fortran so much faster (even without optimization)?
> 
> It looks like I'll be switching to Fortran after all.  
> 
> Catherine
> 
Actually Fortran with correct array ordering - 13 seconds!  What horrible 
python/numpy
mistake am I making to cause such a slowdown?


Catherine


_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] timing results (was: record arrays initialization)

Reply via email to