Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Eelco Hoogendoorn
not off topic at all; there are several matters of naming that I am not at all settled on yet, and I don't think it is unimportant. indeed, those are closely related functions, and I wasn't aware of them yet, so that's some welcome additional perspective. The mathematica function differs in that t

Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Alan G Isaac
My comment is just on the name. I'd expect something named `groupby` to behave essentially like Mathematica's `GatherBy` command. http://reference.wolfram.com/mathematica/ref/GatherBy.html I think you are after something more like Matlab's grpstats: http://www.mathworks.com/help/stats/grpstats.htm

Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Eelco Hoogendoorn
Alan: The equivalent of that in my current draft would be group_by(keys, values), which is shorthand for group_by(keys).group(values); a optional values argument to the constructor of GroupBy is directly bound to return an iterable over the grouped values; but we often want to bind different value

Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Alan G Isaac
On 1/26/2014 12:02 PM, Stéfan van der Walt wrote: > what would the output of > > ``group_by((key1, key2))`` I'd expect something named "groupby" to behave as below. Alan def groupby(seq, key): from collections import defaultdict groups = defaultdict(list) for item in seq: groups[

Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Eelco Hoogendoorn
To follow up with an example as to why it is useful that a temporary object is created, consider the following (taken from the radial reduction example): g = group_by(np.round(radius, 5).flatten()) pp.errorbar( g.unique, g.mean(sample.flatten())[1], g.std(sample.fla

Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Eelco Hoogendoorn
An object of type GroupBy. So a call to group_by does not return any consumable output directly. If you want for instance the unique keys, or groups if you will, you can call GroupBy.unique. In this case, for a tuple of input keys, youd get a tuple of unique keys back. If you want to compute sever

Re: [Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Stéfan van der Walt
Hi Eelco On Sun, 26 Jan 2014 12:20:04 +0100, Eelco Hoogendoorn wrote: > key1 = list('abaabb') > key2 = np.random.randint(0,2,(6,2)) > values = np.random.rand(6,3) > print group_by((key1, key2)).median(values) I agree that group_by functionality could be handy in numpy. In the above example, what

[Numpy-discussion] Numpy Enhancement Proposal: group_by functionality

2014-01-26 Thread Eelco Hoogendoorn
Hi all, Please critique my draft exploring the possibilities of adding group_by support to numpy: http://pastebin.com/c5WLWPbp In nearly ever project I work on, I require group_by functionality of some sort. There are other libraries that provide this kind of functionality, such as pandas for ins