On Wed, Aug 13, 2014 at 6:17 PM, Eelco Hoogendoorn < [email protected]> wrote:
> Its pretty easy to implement this table functionality and more on top of > the code I linked above. I still think such a comprehensive overhaul of > arraysetops is worth discussing. > > import numpy as np > import grouping > x = [1, 1, 1, 1, 2, 2, 2, 2, 2] > y = [3, 4, 3, 3, 3, 4, 5, 5, 5] > z = np.random.randint(0,2,(9,2)) > def table(*keys): > """ > desired table implementation, building on the index object > cleaner, and more functionality > performance should be the same > """ > indices = [grouping.as_index(k, axis=0) for k in keys] > uniques = [i.unique for i in indices] > inverses = [i.inverse for i in indices] > shape = [i.groups for i in indices] > t = np.zeros(shape, np.int) > np.add.at(t, inverses, 1) > return tuple(uniques), t > #here is how to use > print table(x,y) > #but we can use fancy keys as well; here a composite key and a row-key > print table((x,y), z) > #this effectively creates a sparse matrix equivalent of your desired table > print grouping.count((x,y)) > > > On Wed, Aug 13, 2014 at 11:25 PM, Warren Weckesser < > [email protected]> wrote: > >> >> >> >> On Wed, Aug 13, 2014 at 5:15 PM, Benjamin Root <[email protected]> wrote: >> >>> The ever-wonderful pylab mode in matplotlib has a table function for >>> plotting a table of text in a plot. If I remember correctly, what would >>> happen is that matplotlib's table() function will simply obliterate the >>> numpy's table function. This isn't a show-stopper, I just wanted to point >>> that out. >>> >>> Personally, while I wasn't a particular fan of "count_unique" because I >>> wouldn't necessarially think of it when needing a contingency table, I do >>> like that it is verb-ish. "table()", in this sense, is not a verb. That >>> said, I am perfectly fine with it if you are fine with the name collision >>> in pylab mode. >>> >>> >> >> Thanks for pointing that out. I only changed it to have something that >> sounded more table-ish, like the Pandas, R and Matlab functions. I won't >> update it right now, but if there is interest in putting it into numpy, >> I'll rename it to avoid the pylab conflict. Anything along the lines of >> `crosstab`, `xtable`, etc., would be fine with me. >> >> Warren >> >> >> >>> On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser < >>> [email protected]> wrote: >>> >>>> >>>> >>>> >>>> On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < >>>> [email protected]> wrote: >>>> >>>>> ah yes, that's also an issue I was trying to deal with. the semantics >>>>> I prefer in these type of operators, is (as a default), to have every >>>>> array >>>>> be treated as a sequence of keys, so if calling unique(arr_2d), youd get >>>>> unique rows, unless you pass axis=None, in which case the array is >>>>> flattened. >>>>> >>>>> I also agree that the extension you propose here is useful; but >>>>> ideally, with a little more discussion on these subjects we can converge >>>>> on >>>>> an even more comprehensive overhaul >>>>> >>>>> >>>>> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks. Prompted by that stackoverflow question, and similar >>>>>>> problems I had to deal with myself, I started working on a much more >>>>>>> general extension to numpy's functionality in this space. Like you >>>>>>> noted, >>>>>>> things get a little panda-y, but I think there is a lot of panda's >>>>>>> functionality that could or should be part of the numpy core, a robust >>>>>>> set >>>>>>> of grouping operations in particular. >>>>>>> >>>>>>> see pastebin here: >>>>>>> http://pastebin.com/c5WLWPbp >>>>>>> >>>>>> >>>>>> On a side note, this is related to a pull request of mine from awhile >>>>>> back: https://github.com/numpy/numpy/pull/3584 >>>>>> >>>>>> There was a lot of disagreement on the mailing list about what to >>>>>> call a "unique slices along a given axis" function, so I wound up closing >>>>>> the pull request pending more discussion. >>>>>> >>>>>> At any rate, I think it's a useful thing to have in "base" numpy. >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> [email protected] >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> [email protected] >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> Update: I renamed the function to `table` in the pull request: >>>> https://github.com/numpy/numpy/pull/4958 >>>> >>>> >>>> Warren >>>> >>>> Hey all, I'm reviving this thread about the proposed `table` enhancement in https://github.com/numpy/numpy/pull/4958, because Chuck has poked me (via the pull request ) about it, so I'm poking the mailing list. Ignoring the issue of the name for the moment, is there any opposition to adding the proposed `table` function to numpy? I don't think it would preclude adding more powerful tools later, but that's not something I have time to work on at the moment. If the only issue is the name, I'm open to any suggestions. I started with `count_unique`, and changed it to `table`, but Benjamin pointed out the potential conflict of `table` with a matplotlib function. Warren _______________________________________________ >>>> NumPy-Discussion mailing list >>>> [email protected] >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> [email protected] >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
