On Sat, Apr 10, 2010 at 1:23 PM, Travis Oliphant <[email protected]> wrote: > > Hi, > > I've been mulling over a couple of ideas for new ufunc methods plus a > couple of numpy functions that I think will help implement group-by > operations with NumPy arrays. > > I wanted to discuss them on this list before putting forward an actual > proposal or patch to get input from others. > > The group-by operation is very common in relational algebra and NumPy > arrays (especially structured arrays) can often be seen as a database > table. There are common and easy-to implement approaches for select > and other relational algebra concepts, but group-by basically has to > be implemented yourself. > > Here are my suggested additions to NumPy: > > ufunc methods: > * reduceby (array, by, sorted=1, axis=0) > > array is the array to reduce > by is the array to provide the grouping (can be a structured > array or a list of arrays) > > if sorted is 1, then possibly a faster algorithm can be > used.
how is the grouping in "by" specified? These functions would be very useful for statistics. One problem with the current bincount is that it doesn't allow multi-dimensional weight arrays (with axis argument). Josef > > * reducein (array, indices, axis=0) > > similar to reduce-at, but the indices provide both the > start and end points (rather than being fence-posts like reduceat). > > numpy functions (or methods): > > * segment(array) > > (produce an array of integers from an array producing the > different "regions" of an array: > > segment([10,20,10,20,30,30,10]) would produce ([0,1,0,1,2,2,0]) > > > * edges(array, at=True) > > produce an index array providing the edges (with either fence-post > like syntax for reduce-at or both boundaries like reducein. > > > Thoughts? > > -Travis > > > > > > > Thoughts on the general idea? > > > -- > Travis Oliphant > Enthought Inc. > 1-512-536-1057 > http://www.enthought.com > [email protected] > > > > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
