On Mon, 2014-02-03 at 00:41 -0800, Dinesh Vadhia wrote: > Does the numpy indexing refactorizing address the performance of fancy > indexing highlighted in wes mckinney's blog some years back - > http://wesmckinney.com/blog/?p=215 - where numpy.take() was shown to > be preferable than fancy indexing?
Well, technically there is a pretty big difference in how those two approach the problem and the indexing is much more general[1] and (especially now) optimized for large subspaces[2]. So while the new code is faster even for small subspaces, the linked test hits the performance wise worst spot (of course make the thing 10000x2 will be a bit worse): In [4]: arr = np.random.randn(10000, 5) In [5]: indexer = np.arange(10000) In [6]: random.shuffle(indexer) In [7]: %timeit arr[indexer] 1000 loops, best of 3: 300 us per loop In [8]: %timeit arr.take(indexer, axis=0) 10000 loops, best of 3: 95.4 us per loop # With a larger subspace the result is (now) different though: In [9]: arr = np.random.randn(10000, 100) In [10]: %timeit arr[indexer] 100 loops, best of 3: 4.85 ms per loop In [11]: %timeit arr.take(indexer, axis=0) 100 loops, best of 3: 5.02 ms per loop So the performance in this use case improved (maybe a factor of 3, which is neither shabby nor sensational), but the subspace design and no big special cases for this operation leads to overhead. You could likely squeeze out a bit, but squeezing out a lot is non-trivial [3]. However this should be one of the few cases where take should still outperform advanced indexing. - Sebastian [1] Mostly multiple integer array indices, which take does not support and arbitrary memory order. [2] subspace here means the non-advanced indexing part. In the case of arr[integer_array, :] the (or a) subspace is arr[0, :]. [3] Basically, we have the value and the indexing arrays to iterate, since they are arbitrary (unlike take which assumes contiguous) and broadcasting etc. I did not try to squeeze them into a single iterator when there is a subspace (it is complicating, basically I want buffering for the index arrays, but if you use buffering on the value array you can't easily get the pointer into the subspace, etc.). This means you have two iterators, which again means that the external loop optimization is not quite trivial (since the two inner loops can have different sizes). So *if* you were to just add the logic for the different sized inner loops, then you could use the external loop optimization of NpyIter and probably remove the overhead causing (most of) this discrepancy. Though I would suggest timing first to be sure this is the reason ^^. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion