[Numpy-discussion] ANN: PyTables 3.2.3 released.
=== Announcing PyTables 3.2.3 === We are happy to announce PyTables 3.2.3. What's new == This is a bug fix release. It solves many issues reported in the months since the release of 3.2.2. In case you want to know more in detail what has changed in this version, please refer to: http://www.pytables.org/release_notes.html For an online version of the manual, visit: http://www.pytables.org/usersguide/index.html What it is? === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources = About PyTables: http://www.pytables.org Github: http://www.github.com/PyTables/PyTables About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Developers ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
> 1. This is not a NumPy question; StackExchange would be more appropriate. Thanks, that is the fairly straightforward -- but slow -- solution, which I have already implemented. I was asking if numpy had some filtering functions which might speed things up (it's a huge library, with which I'm not terribly familiar). It's fine if the answer is "no". ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
> Any way that you can make your keys numeric? Then you can run np.diff on > that first column, and use the indices of nonzero entries (np.flatnonzero) > to know where values change. With a +1/-1 offset (that I am too lazy to > figure out right now ;) you can then index into the original rows to get > either the first or last occurrence of each run. I'll give it some thought, but one of the elements of the key is definitely a (short, < six characters) string. Hashing it probably wouldn't work, too great a chance for collisions. S ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
This is trivial in pandas. a simple groupby. In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b', 12, -329.0]] In [7]: df = DataFrame(data, columns=list('ABC')) In [8]: df Out[8]: A B C 0 a 27 14.5 1 b 12 99.0 2 a 17 100.3 3 b 12 -329.0 In [9]: df.groupby('A').first() Out[9]: B C A a 27 14.5 b 12 99.0 In [10]: df.groupby('A').last() Out[10]: B C A a 17 100.3 b 12 -329.0 On Mon, Jul 4, 2016 at 7:27 PM, Skip Montanaro wrote: > > Any way that you can make your keys numeric? Then you can run np.diff on > > that first column, and use the indices of nonzero entries > (np.flatnonzero) > > to know where values change. With a +1/-1 offset (that I am too lazy to > > figure out right now ;) you can then index into the original rows to get > > either the first or last occurrence of each run. > > I'll give it some thought, but one of the elements of the key is definitely > a (short, < six characters) string. Hashing it probably wouldn't work, too > great a chance for collisions. > > S > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
> This is trivial in pandas. a simple groupby. Oh, cool. Precisely the sort of solution I was hoping would turn up. Straightforward, easy for a novice data slinger like me to understand, and likely a bazillion times faster than the straightforward version. Skip ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
On 4 July 2016 at 7:38:48 PM, Skip Montanaro (skip.montan...@gmail.com) wrote: Oh, cool. Precisely the sort of solution I was hoping would turn up. Except it doesn’t seem to meet your original spec, which retrieved the first item of each *run* of an index value? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com) wrote: Hashing it probably wouldn't work, too great a chance for collisions. If the string is ASCII, you can always interpret the bytes as part of an 8 byte integer. Or, you can map unique values to consecutive integers. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key
On Tue, Jul 5, 2016 at 1:03 AM, Juan Nunez-Iglesias wrote: > On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com) > wrote: > > Hashing it probably wouldn't work, too > great a chance for collisions. > > > If the string is ASCII, you can always interpret the bytes as part of an 8 > byte integer. Or, you can map unique values to consecutive integers. > IIUC np.nonzero(a[1] == a[:-1]) gives all changes independent of dtype. add or remove a 1 to adjust which element is indexed. (IIRC from a long time ago, arraysetops used/uses something like this.) Josef > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion