[Numpy-discussion] ANN: PyTables 3.2.3 released.

2016-07-04 Thread Tom Kooij
===
 Announcing PyTables 3.2.3
===

We are happy to announce PyTables 3.2.3.


What's new
==

This is a bug fix release. It solves many issues reported in the
months since the release of 3.2.2.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org
Github: http://www.github.com/PyTables/PyTables

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Skip Montanaro
> 1. This is not a NumPy question; StackExchange would be more appropriate.

Thanks, that is the fairly straightforward -- but slow -- solution, which I
have already implemented. I was asking if numpy had some filtering functions
which might speed things up (it's a huge library, with which I'm not
terribly familiar). It's fine if the answer is "no".

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Skip Montanaro
> Any way that you can make your keys numeric? Then you can run np.diff on
> that first column, and use the indices of nonzero entries (np.flatnonzero)
> to know where values change. With a +1/-1 offset (that I am too lazy to
> figure out right now ;) you can then index into the original rows to get
> either the first or last occurrence of each run.

I'll give it some thought, but one of the elements of the key is definitely
a (short, < six characters) string.  Hashing it probably wouldn't work, too
great a chance for collisions.

S


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Jeff Reback
This is trivial in pandas. a simple groupby.

In [6]: data = [[ 'a', 27, 14.5 ],['b', 12, 99.0],['a', 17, 100.3], ['b',
12, -329.0]]

In [7]: df = DataFrame(data, columns=list('ABC'))

In [8]: df
Out[8]:
   A   B  C
0  a  27   14.5
1  b  12   99.0
2  a  17  100.3
3  b  12 -329.0

In [9]: df.groupby('A').first()
Out[9]:
B C
A
a  27  14.5
b  12  99.0

In [10]: df.groupby('A').last()
Out[10]:
B  C
A
a  17  100.3
b  12 -329.0


On Mon, Jul 4, 2016 at 7:27 PM, Skip Montanaro 
wrote:

> > Any way that you can make your keys numeric? Then you can run np.diff on
> > that first column, and use the indices of nonzero entries
> (np.flatnonzero)
> > to know where values change. With a +1/-1 offset (that I am too lazy to
> > figure out right now ;) you can then index into the original rows to get
> > either the first or last occurrence of each run.
>
> I'll give it some thought, but one of the elements of the key is definitely
> a (short, < six characters) string.  Hashing it probably wouldn't work, too
> great a chance for collisions.
>
> S
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Skip Montanaro
> This is trivial in pandas. a simple groupby.

Oh, cool. Precisely the sort of solution I was hoping would turn up.
Straightforward, easy for a novice data slinger like me to understand,
and likely a bazillion times faster than the straightforward version.

Skip


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Juan Nunez-Iglesias
On 4 July 2016 at 7:38:48 PM, Skip Montanaro (skip.montan...@gmail.com)
wrote:

Oh, cool. Precisely the sort of solution I was hoping would turn up.


Except it doesn’t seem to meet your original spec, which retrieved the
first item of each *run* of an index value?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Juan Nunez-Iglesias
On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com)
wrote:

Hashing it probably wouldn't work, too
great a chance for collisions.


If the string is ASCII, you can always interpret the bytes as part of an 8
byte integer. Or, you can map unique values to consecutive integers.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread josef . pktd
On Tue, Jul 5, 2016 at 1:03 AM, Juan Nunez-Iglesias 
wrote:

> On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com)
> wrote:
>
> Hashing it probably wouldn't work, too
> great a chance for collisions.
>
>
> If the string is ASCII, you can always interpret the bytes as part of an 8
> byte integer. Or, you can map unique values to consecutive integers.
>
IIUC

np.nonzero(a[1] == a[:-1])   gives all changes independent of dtype. add or
remove a 1 to adjust which element is indexed.

(IIRC from a long time ago, arraysetops used/uses something like this.)

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion