On 25 January 2016 at 13:14, Peter Otten <__pete...@web.de> wrote: >> What do you mean by "group rows"? > > Given a table you can specify columns as keys and in the simplest case one > column where you apply an aggregate function over the sets of rows with the > same key. > > If I understand you correctly you are doing that for a single known key, > whereas I considered finding the keys part of the task. In SQL you'd spell > that > > [prob 1] > select key1, key2, sum(value) from some_table group by key1, key2; > >> I thought the OP's problem is really to filter rows which I already >> showed how to do in numpy. > > You solve > > [prob 2] > select sum(value) from some_table where key1=? and key2=?; > > You'll eventually get from [prob 2] to [prob 1], but you need a few lines of > Python.
Oh okay. It wasn't clear to me that was what the OP wanted. You can do it in numpy by combining a few pieces. First we define an array: In [1]: import numpy as np In [2]: a = np.array([[5, 1.0], [1, 3.0], [5, 2.0], [1, 4.0], [5, -1.0]]) In [3]: a Out[3]: array([[ 5., 1.], [ 1., 3.], [ 5., 2.], [ 1., 4.], [ 5., -1.]]) Now we need to sort the array: In [4]: a = np.sort(a, axis=0) In [5]: a Out[5]: array([[ 1., -1.], [ 1., 1.], [ 5., 2.], [ 5., 3.], [ 5., 4.]]) Now we can access the 2nd column easy: In [6]: a[:, 1] Out[6]: array([-1., 1., 2., 3., 4.]) But we want to split that column according to the first column. We can use the split function if we know the indices and we can get them with diff: In [7]: np.diff(a[:, 0]) Out[7]: array([ 0., 4., 0., 0.]) n [14]: np.nonzero(np.diff(a[:, 0]))[0] Out[14]: array([1]) In [15]: indices = np.nonzero(np.diff(a[:, 0]))[0] + 1 In [16]: indices Out[16]: array([2]) Now we can use these to split the second column of the sorted array: In [17]: grouped = np.split(a[:, 1], indices) In [18]: grouped Out[18]: [array([-1., 1.]), array([ 2., 3., 4.])] In [19]: list(map(np.mean, grouped)) Out[19]: [0.0, 3.0] It's not exactly straight-forward but numpy has all the primitives to make this reasonably efficient. If we also want the list of keys then: In [23]: a[np.concatenate([[0], indices]), 0] Out[23]: array([ 1., 5.]) Altogether: import numpy as np a = np.array([[5, 1.0], [1, 3.0], [5, 2.0], [1, 4.0], [5, -1.0]]) a = np.sort(a, axis = 0) indices = np.nonzero(np.diff(a[:, 0]))[0] + 1 means = list(map(np.mean, np.split(a[:, 1], indices))) keys = a[np.concatenate([[0], indices]), 0] group_means = dict(zip(keys, means)) print(group_means) # {1.0: 0.0, 5.0: 3.0} If you want to key on multiple columns use lexsort instead of sort and sum the diff array along rows but otherwise it's the same principle. Looks easier with pandas :) -- Oscar _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor