Hi Wes, I believe that Mark rewrote a bunch of the fancy-indexing-related code from scratch in the masked-NA branch. I don't know if it affects anything you're talking about here, but just as a heads up, you might want to benchmark master, since it may have a different performance profile.
-- Nathaniel On Fri, Apr 6, 2012 at 4:04 AM, Wes McKinney <wesmck...@gmail.com> wrote: > dear all, > > I've routinely found that: > > 1) ndarray.take is up to 1 order of magnitude faster than fancy indexing > 2) Hand-coded Cython boolean indexing is many times faster than ndarray > indexing > 3) putmask is significantly faster than ndarray indexing > > For example, I stumbled on this tonight: > > straightforward cython code: > > def row_bool_subset(ndarray[float64_t, ndim=2] values, > ndarray[uint8_t, cast=True] mask): > cdef: > Py_ssize_t i, j, n, k, pos = 0 > ndarray[float64_t, ndim=2] out > > n, k = (<object> values).shape > assert(n == len(mask)) > > out = np.empty((mask.sum(), k), dtype=np.float64) > > for i in range(n): > if mask[i]: > for j in range(k): > out[pos, j] = values[i, j] > pos += 1 > > return out > > In [1]: values = randn(1000000, 4) > > In [2]: mask = np.ones(1000000, dtype=bool) > > In [3]: import pandas._sandbox as sbx > > In [4]: result = sbx.row_bool_subset(values, mask) > > In [5]: timeit result = sbx.row_bool_subset(values, mask) > 100 loops, best of 3: 9.58 ms per loop > > pure NumPy: > > In [6]: timeit values[mask] > 10 loops, best of 3: 81.6 ms per loop > > Here's the kind of take performance problems that I routinely experience: > > In [12]: values = randn(1000000, 4) > v > In [13]: values.shape > Out[13]: (1000000, 4) > > In [14]: indexer = np.random.permutation(1000000)[:500000] > > In [15]: timeit values[indexer] > 10 loops, best of 3: 70.7 ms per loop > > In [16]: timeit values.take(indexer, axis=0) > 100 loops, best of 3: 13.3 ms per loop > > When I can spare the time in the future I will personally work on > these indexing routines in the C codebase, but I suspect that I'm not > the only one adversely affected by these kinds of performance > problems, and it might be worth thinking about a community effort to > split up the work of retooling these methods to be more performant. We > could use a tool like my vbench project (github.com/wesm/vbench) to > make a list of the performance benchmarks of interest (like the ones > above). > > Unfortunately I am too time constrained at least for the next 6 months > to devote to a complete rewrite of the code in question. Possibly > sometime in 2013 if no one has gotten to it yet, but this seems like > someplace that we should be concerned about as the language > performance wars continue to rage. > > - Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion