f.i., this works as expected as well (100 keys of 1d int arrays and 100 values of 1d float arrays):
group_by(randint(0,4,(100,2))).mean(rand(100,2)) On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn < hoogendoorn.ee...@gmail.com> wrote: > If I understand you correctly, the current implementation supports these > operations. All reductions over groups (except for median) are performed > through the corresponding ufunc (see GroupBy.reduce). This works on > multidimensional arrays as well, although this broadcasting over the > non-grouping axes is accomplished using np.vectorize. Actual vectorization > only happens over the axis being grouped over, but this is usually a long > axis. If it isn't, it is more efficient to perform a reduction by means of > splitting the array by its groups first, and then map the iterable of > groups over some reduction operation (as noted in the docstring of > GroupBy.reduce). > > > On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fernández del Río < > jaime.f...@gmail.com> wrote: > >> Hi Eelco, >> >> I took a deeper look into your code a couple of weeks back. I don't think >> I have fully grasped what it allows completely, but I agree that some form >> of what you have there is highly desirable. Along the same lines, for >> sometime I have been thinking that the right place for a `groupby` in numpy >> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a >> multidimensional version of `np.bincount(groups, weights=arr)`. You would >> then need a more powerful version of `np.unique` to produce the `groups`, >> but that is something that Joe Kington's old PR was very close to >> achieving, that should probably be resurrected as well. But yes, there >> seems to be material for a NEP here, and some guidance from one of the >> numpy devs would be helpful in getting this somewhere. >> >> Jaime >> >> >> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < >> hoogendoorn.ee...@gmail.com> wrote: >> >>> It wouldn't hurt to have this function, but my intuition is that its use >>> will be minimal. If you are already working with sorted arrays, you already >>> have a flop cost on that order of magnitude, and the optimized merge saves >>> you a factor two at the very most. Using numpy means you are sacrificing >>> factors of two and beyond relative to pure C left right and center anyway, >>> so if this kind of thing matters to you, you probably wont be working in >>> numpy in the first place. >>> >>> That said, I share your interest in overhauling arraysetops. I see many >>> opportunities for expanding its functionality. There is a question that >>> amounts to 'how do I do group-by in numpy' on stackoverflow almost every >>> week. That would have my top priority, but also things like extending >>> np.unique to things like graph edges, or other more complex input, is very >>> often useful to me. >>> >>> Ive written up a draft <http://pastebin.com/c5WLWPbp>a while ago which >>> accomplishes all of the above and more. It reimplements functions like >>> np.unique around a common Index object. This index object encapsulates the >>> precomputation (sorting) required for efficient set-ops on different >>> datatypes, and provides a common interface to obtain the kind of >>> information you are talking about (which is used extensively internally in >>> the implementation of group_by, for instance). >>> >>> ie, this functionality allows you to write neat things like >>> group_by(randint(0,9,(100,2))).median(rand(100)) >>> >>> But I have the feeling much more could be done in this direction, and I >>> feel this draft could really use a bit of back and forth. If we are going >>> to completely rewrite arraysetops, we might as well do it right. >>> >>> >>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fernández del Río < >>> jaime.f...@gmail.com> wrote: >>> >>>> A request was open in github to add a `merge` function to numpy that >>>> would merge two sorted 1d arrays into a single sorted 1d array. I have been >>>> playing around with that idea for a while, and have a branch in my numpy >>>> fork that adds a `mergesorted` function to `numpy.lib`: >>>> >>>> >>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>>> >>>> I drew inspiration from C++ STL algorithms, and merged into a single >>>> function what merge, set_union, set_intersection, set_difference and >>>> set_symmetric_difference do there. >>>> >>>> My first thought when implementing this was to not make it a public >>>> function, but use it under the hood to speed-up some of the functions of >>>> `arraysetops.py`, which are now merging two already sorted functions by >>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>>> testing, but the speed-ups weren't that great. >>>> >>>> One other thing I saw value in for some of the `arraysetops.py` >>>> functions, but couldn't fully figure out, was in providing extra output >>>> aside from the merged arrays, either in the form of indices, or of boolean >>>> masks, indicating which items of the original arrays made it into the >>>> merged one, and/or where did they end up in it. >>>> >>>> Since there is at least one other person out there that likes it, is >>>> there any more interest in such a function? If yes, any comments on what >>>> the proper interface for extra output should be? Although perhaps the best >>>> is to leave that out for starters and see what use people make of it, if >>>> any. >>>> >>>> Jaime >>>> >>>> -- >>>> (\__/) >>>> ( O.o) >>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus >>>> planes de dominación mundial. >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes >> de dominación mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion