On Apr 12, 2010, at 5:31 PM, Robert Kern wrote:
We should collect all of these proposals into a NEP. To
clarify what I
mean by "group-by" behavior.
Suppose I have an array of floats and an array of integers. Each
element
in the array of integers represents a region in the float array of
a certain
"kind". The reduction should take place over like-kind values:
Example:
add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
results in the calculations:
1 + 3 + 6 + 7
2 + 4
5 + 8 + 9
and therefore the output (notice the two arrays --- perhaps a
structured
array should be returned instead...)
[0,1,2],
[17, 6, 22]
The real value is when you have tabular data and you want to do
reductions
in one field based on values in another field. This happens all
the time
in relational algebra and would be a relatively straightforward
thing to
support in ufuncs.
I might suggest a simplification where the by array must be an array
of non-negative ints such that they are indices into the output. For
example (note that I replace 2 with 3 and have no 2s in the by array):
add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
[17, 6, 0, 22]
This basically generalizes bincount() to other binary ufuncs.
Interesting proposal. I do like the having only one output.
I'm particularly interested in reductions with "by" arrays of
strings. i.e. something like:
add.reduceby([10,11,12,13,14,15,16],
by=['red','green','red','green','red','blue', 'blue']).
resulting in:
10+12+14
11+13
15+16
In practice, these would have to be essentially mapped to the kind of
integer array I used in the original example, and so I suppose if we
couple your proposal with the segment function from the rest of my
original proposal, then the same resulting functionality is available
(with perhaps the extra intermediate integer array that may not be
strictly necessary).
But, having simple building blocks is usually better in the long run
(and typically leads to better optimizations by human programmers).
Thanks,
-Travis
--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
[email protected]
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion