Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread josef . pktd
On Tue, Jul 5, 2016 at 1:03 AM, Juan Nunez-Iglesias wrote: > On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com) > wrote: > > Hashing it probably wouldn't work, too > great a chance for collisions. > > > If the string is ASCII, you can always interpret the bytes as part of an

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Juan Nunez-Iglesias
On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com) wrote: Hashing it probably wouldn't work, too great a chance for collisions. If the string is ASCII, you can always interpret the bytes as part of an 8 byte integer. Or, you can map unique values to consecutive integers. ___

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Juan Nunez-Iglesias
On 4 July 2016 at 7:38:48 PM, Skip Montanaro (skip.montan...@gmail.com) wrote: Oh, cool. Precisely the sort of solution I was hoping would turn up. Except it doesn’t seem to meet your original spec, which retrieved the first item of each *run* of an index value? _

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Skip Montanaro
> This is trivial in pandas. a simple groupby. Oh, cool. Precisely the sort of solution I was hoping would turn up. Straightforward, easy for a novice data slinger like me to understand, and likely a bazillion times faster than the straightforward version. Skip _

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Jeff Reback
This is trivial in pandas. a simple groupby. In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b', 12, -329.0]] In [7]: df = DataFrame(data, columns=list('ABC')) In [8]: df Out[8]: A B C 0 a 27 14.5 1 b 12 99.0 2 a 17 100.3 3 b 12 -329.0 In [9]: df.gro

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Skip Montanaro
> Any way that you can make your keys numeric? Then you can run np.diff on > that first column, and use the indices of nonzero entries (np.flatnonzero) > to know where values change. With a +1/-1 offset (that I am too lazy to > figure out right now ;) you can then index into the original rows to ge

Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Skip Montanaro
> 1. This is not a NumPy question; StackExchange would be more appropriate. Thanks, that is the fairly straightforward -- but slow -- solution, which I have already implemented. I was asking if numpy had some filtering functions which might speed things up (it's a huge library, with which I'm not

[Numpy-discussion] ANN: PyTables 3.2.3 released.

2016-07-04 Thread Tom Kooij
=== Announcing PyTables 3.2.3 === We are happy to announce PyTables 3.2.3. What's new == This is a bug fix release. It solves many issues reported in the months since the release of 3.2.2. In case you want to know more in detail what has