On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote: > A quick recap of the problem: a 128x512 array of 7-element vectors > (element), and a 5000-vector > training dataset (targets). For each vector in element, I want to find the > best-match in targets, > defined as minimizing the Euclidean distance. > > I coded it up three ways: (a) looping through each vector in element > individually, (b) vectorizing > the function in the previous step, and coding it up in Fortran. The heart of > the "find-best-match" > code in Python looks like so I'm not doing an individual loop through all > 5000 vectors in targets: > > nlen = xelement.shape[0] > nvec = targets.data.shape[0] > x = xelement.reshape(1, nlen).repeat(nvec, axis=0) > > diffs = ((x - targets.data)**2).sum(axis=1) > diffs = numpy.sqrt(diffs) > return int(numpy.argmin(diffs, axis=0)) > > Here are the results: > > (a) looping through each vector: 68 seconds > (b) vectorizing this: 58 seconds > (c) raw Fortran with loops: 26 seconds > > I was surprised to see that vectorizing didn't gain me that much time, and > that the Fortran > was so much faster than both python alternatives. So, there's a lot that I > don't know about > how the internals of numpy and python work. > > Why does the loop through 128x512 elements in python only take an additional > 10 seconds? What > is the main purpose of vectorizing - is it optimization by taking the looping > step out of the > Python and into the C-base or something different? > > And, why is the Fortran so much faster (even without optimization)? > > It looks like I'll be switching to Fortran after all. > > Catherine > Actually Fortran with correct array ordering - 13 seconds! What horrible python/numpy mistake am I making to cause such a slowdown?
Catherine _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
