Re: [Numpy-discussion] timing results (was: record arrays initialization)

Perry Greenfield Thu, 03 May 2012 11:28:41 -0700

On May 3, 2012, at 1:38 PM, Moroney, Catherine M (388D) wrote:

>
> On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote:
>
>> A quick recap of the problem:  a 128x512 array of 7-element vectors  
>> (element), and a 5000-vector
>> training dataset (targets).  For each vector in element, I want to  
>> find the best-match in targets,
>> defined as minimizing the Euclidean distance.
>>
>> I coded it up three ways: (a) looping through each vector in  
>> element individually, (b) vectorizing
>> the function in the previous step, and coding it up in Fortran.   
>> The heart of the "find-best-match"
>> code in Python looks like so I'm not doing an individual loop  
>> through all 5000 vectors in targets:
>>
>>   nlen = xelement.shape[0]
>>   nvec = targets.data.shape[0]
>>   x = xelement.reshape(1, nlen).repeat(nvec, axis=0)
>>
>>   diffs = ((x - targets.data)**2).sum(axis=1)
>>   diffs = numpy.sqrt(diffs)
>>   return int(numpy.argmin(diffs, axis=0))
>>
>> Here are the results:
>>
>> (a) looping through each vector:  68 seconds
>> (b) vectorizing this:             58 seconds
>> (c) raw Fortran with loops:       26 seconds
>>
>> I was surprised to see that vectorizing didn't gain me that much  
>> time, and that the Fortran
>> was so much faster than both python alternatives.  So, there's a  
>> lot that I don't know about
>> how the internals of numpy and python work.
>>
>> Why does the loop through 128x512 elements in python only take an  
>> additional 10 seconds?  What
>> is the main purpose of vectorizing - is it optimization by taking  
>> the looping step out of the
>> Python and into the C-base or something different?


Because for the size of the arrays being manipulated inside the loop,  
the python/numpy loop overhead isn't all that big. If you were only  
doing 100 vectors in target, you would see a big difference.

Perry


_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] timing results (was: record arrays initialization)

Reply via email to