Hi,
I've been mulling over a couple of ideas for new ufunc methods plus a
couple of numpy functions that I think will help implement group-by
operations with NumPy arrays.
I wanted to discuss them on this list before putting forward an actual
proposal or patch to get input from others.
The group-by operation is very common in relational algebra and NumPy
arrays (especially structured arrays) can often be seen as a database
table. There are common and easy-to implement approaches for select
and other relational algebra concepts, but group-by basically has to
be implemented yourself.
Here are my suggested additions to NumPy:
ufunc methods:
* reduceby (array, by, sorted=1, axis=0)
array is the array to reduce
by is the array to provide the grouping (can be a structured
array or a list of arrays)
if sorted is 1, then possibly a faster algorithm can be
used.
* reducein (array, indices, axis=0)
similar to reduce-at, but the indices provide both the
start and end points (rather than being fence-posts like reduceat).
numpy functions (or methods):
* segment(array)
(produce an array of integers from an array producing the
different "regions" of an array:
segment([10,20,10,20,30,30,10]) would produce ([0,1,0,1,2,2,0])
* edges(array, at=True)
produce an index array providing the edges (with either fence-post
like syntax for reduce-at or both boundaries like reducein.
Thoughts?
-Travis
Thoughts on the general idea?
--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
[email protected]
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion