Re: [Numpy-discussion] Tools / data structures for statistical analysis and related applications

2010-06-11 Thread Yaroslav Halchenko
On Fri, 11 Jun 2010, Keith Goodman wrote: > > For this purpose, I like Fernando's data array: > > http://github.com/fperez/datarray A very simple subclass of ndarrays > > that answers my most-wanted feature in terms of richer data > > structures. > > >...< > I looks like datarray labels the axes.

Re: [Numpy-discussion] PutSum?

2010-06-11 Thread Robert Kern
On Fri, Jun 11, 2010 at 22:06, wrote: > Hi all, > > I recently needed to a simple array procedure that I assumed would be > supported (and almost is) but was surprised to find some limitations. > > I've got two arrays, A and B, and some array of indices, I, and I want to > perform the operation >

[Numpy-discussion] PutSum?

2010-06-11 Thread jordan
Hi all, I recently needed to a simple array procedure that I assumed would be supported (and almost is) but was surprised to find some limitations. I've got two arrays, A and B, and some array of indices, I, and I want to perform the operation A[I] += B where I and B have the same dimensions.

Re: [Numpy-discussion] Tools / data structures for statistical analysis and related applications

2010-06-11 Thread Keith Goodman
On Fri, Jun 11, 2010 at 11:51 AM, Gael Varoquaux wrote: > On Fri, Jun 11, 2010 at 12:57:37PM -0500, Bruce Southey wrote: >>  2. Do we really need to build custom data structures (larry, pandas, >>  tabular, etc.) or are structured ndarrays enough? (My conclusion is >>  that we do need to, but othe

Re: [Numpy-discussion] Tools / data structures for statistical analysis and related applications

2010-06-11 Thread Gael Varoquaux
On Fri, Jun 11, 2010 at 12:57:37PM -0500, Bruce Southey wrote: > 2. Do we really need to build custom data structures (larry, pandas, > tabular, etc.) or are structured ndarrays enough? (My conclusion is > that we do need to, but others might disagree). If so, how much > performance are we will

Re: [Numpy-discussion] Tools / data structures for statistical analysis and related applications

2010-06-11 Thread Bruce Southey
On 06/11/2010 10:26 AM, Wes McKinney wrote: On Fri, Jun 11, 2010 at 9:46 AM, Bruce Southey wrote: On 06/09/2010 03:40 PM, Wes McKinney wrote: Dear all, We've been having discussions on the pystatsmodels mailing list recently regarding data structures and other tools for statistics /

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen
Fri, 11 Jun 2010 15:31:45 +0200, Sturla Molden wrote: [clip] >> The innermost dimension is handled via the ufunc loop, which is a >> simple for loop with constant-size step and is given a number of >> iterations. The array iterator objects are used only for stepping >> through the outer dimensions.

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sturla Molden
Den 11.06.2010 17:17, skrev Anne Archibald: > > On the other hand, since memory reads are very slow, optimizations > that do more calculation per load/store could make a very big > difference, eliminating temporaries as a side effect. > Yes, that's the main issue, not the extra memory they use

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Dag Sverre Seljebotn
Sturla Molden wrote: > Den 11.06.2010 09:14, skrev Sebastien Binet: >> it of course depends on the granularity at which you wrap and use >> numpy-core but tight loops calling ctypes ain't gonna be pretty >> performance-wise. >> > > Tight loops in Python are never pretty. > > The purpose of ve

Re: [Numpy-discussion] Tools / data structures for statistical analysis and related applications

2010-06-11 Thread Wes McKinney
On Fri, Jun 11, 2010 at 9:46 AM, Bruce Southey wrote: > On 06/09/2010 03:40 PM, Wes McKinney wrote: >> Dear all, >> >> We've been having discussions on the pystatsmodels mailing list >> recently regarding data structures and other tools for statistics / >> other related data analysis applications.

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Anne Archibald
On 11 June 2010 11:12, Benjamin Root wrote: > > > On Fri, Jun 11, 2010 at 8:31 AM, Sturla Molden wrote: >> >> >> It would also make sence to evaluate expressions like "y = b*x + a" >> without a temporary array for b*x. I know roughly how to do it, but >> don't have time to look at it before next

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Benjamin Root
On Fri, Jun 11, 2010 at 8:31 AM, Sturla Molden wrote: > > > It would also make sence to evaluate expressions like "y = b*x + a" > without a temporary array for b*x. I know roughly how to do it, but > don't have time to look at it before next year. (Yes I know about > numexpr, I am talking about p

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sturla Molden
Den 11.06.2010 09:14, skrev Sebastien Binet: > it of course depends on the granularity at which you wrap and use > numpy-core but tight loops calling ctypes ain't gonna be pretty > performance-wise. > Tight loops in Python are never pretty. The purpose of vectorization with NumPy is to avoid

Re: [Numpy-discussion] Tools / data structures for statistical analysis and related applications

2010-06-11 Thread Bruce Southey
On 06/09/2010 03:40 PM, Wes McKinney wrote: > Dear all, > > We've been having discussions on the pystatsmodels mailing list > recently regarding data structures and other tools for statistics / > other related data analysis applications. I believe we're trying to > answer a number of different, bu

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sturla Molden
Den 11.06.2010 10:17, skrev Pauli Virtanen: >> 1. Collect an array of pointers to each subarray (e.g. using >> std::vector or dtype**) >> 2. Dispatch on the pointer array... >> > This is actually what the current ufunc code does. > > The innermost dimension is handled via the ufunc loop, whi

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Hans Meine
On Friday 11 June 2010 10:38:28 Pauli Virtanen wrote: > Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote: > > Ideally, algorithms would get wrapped in between two additional > > pre-/postprocessing steps: > > > > 1) Preprocessing: After broadcasting, transpose the input arrays such > > that they

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen
Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote: [clip] > Ideally, algorithms would get wrapped in between two additional > pre-/postprocessing steps: > > 1) Preprocessing: After broadcasting, transpose the input arrays such > that they become C order. More specifically, sort the strides of one

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Hans Meine
On Thursday 10 June 2010 22:28:28 Pauli Virtanen wrote: > Some places where Openmp could probably help are in the inner ufunc > loops. However, improving the memory efficiency of the data access > pattern is another low-hanging fruit for multidimensional arrays. I was about to mention this when th

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen
Thu, 10 Jun 2010 23:56:56 +0200, Sturla Molden wrote: [clip] > Also about array iterators in NumPy's C base (i.e. for doing something > along an axis): we don't need those. There is a different way of coding > which leads to faster code. > > 1. Collect an array of pointers to each subarray (e.g. u

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Francesc Alted
A Friday 11 June 2010 02:27:18 Sturla Molden escrigué: > >> Another thing I did when reimplementing lfilter was "copy-in copy-out" > >> for strided arrays. > > > > What is copy-in copy out ? I am not familiar with this term ? > > Strided memory access is slow. So it often helps to make a temporary

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sebastien Binet
On Fri, 11 Jun 2010 00:25:17 +0200, Sturla Molden wrote: > Den 10.06.2010 22:07, skrev Travis Oliphant: > > > >> 2. The core should be a plain DLL, loadable with ctypes. (I know David > >> Cournapeau and Robert Kern is going to hate this.) But if Python can have > >> a custom loader for .pyd fil