On May 3, 2012, at 1:38 PM, Moroney, Catherine M (388D) wrote: > > On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote: > >> A quick recap of the problem: a 128x512 array of 7-element vectors >> (element), and a 5000-vector >> training dataset (targets). For each vector in element, I want to >> find the best-match in targets, >> defined as minimizing the Euclidean distance. >> >> I coded it up three ways: (a) looping through each vector in >> element individually, (b) vectorizing >> the function in the previous step, and coding it up in Fortran. >> The heart of the "find-best-match" >> code in Python looks like so I'm not doing an individual loop >> through all 5000 vectors in targets: >> >> nlen = xelement.shape[0] >> nvec = targets.data.shape[0] >> x = xelement.reshape(1, nlen).repeat(nvec, axis=0) >> >> diffs = ((x - targets.data)**2).sum(axis=1) >> diffs = numpy.sqrt(diffs) >> return int(numpy.argmin(diffs, axis=0)) >> >> Here are the results: >> >> (a) looping through each vector: 68 seconds >> (b) vectorizing this: 58 seconds >> (c) raw Fortran with loops: 26 seconds >> >> I was surprised to see that vectorizing didn't gain me that much >> time, and that the Fortran >> was so much faster than both python alternatives. So, there's a >> lot that I don't know about >> how the internals of numpy and python work. >> >> Why does the loop through 128x512 elements in python only take an >> additional 10 seconds? What >> is the main purpose of vectorizing - is it optimization by taking >> the looping step out of the >> Python and into the C-base or something different?
Because for the size of the arrays being manipulated inside the loop, the python/numpy loop overhead isn't all that big. If you were only doing 100 vectors in target, you would see a big difference. Perry _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
