Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
http://github.com/rspeer/datarray represents my best guess at the SciPy BOF consensus. I recently switched the method of accessing named ticks from .named() to .named[] based on further discussion here. My implementation is still missing the case with named ticks but positional axes, however. That

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Keith Goodman
On Thu, Jul 8, 2010 at 1:20 PM, Fernando Perez wrote: > The consensus at the  BoF (not that it means it's set in stone, simply > that there was  good chance for back-and-forth on the topic with many > voices) was that: > > 1. There are valid use cases for 'integer ticks',  i.e. integers that > in

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Robert Kern
On Thu, Jul 8, 2010 at 22:43, Bruce Southey wrote: > On Thu, Jul 8, 2010 at 5:09 PM, Robert Kern wrote: >> On Thu, Jul 8, 2010 at 18:00, Bruce Southey wrote: >>> On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer wrote: >> Still, I have a question. Did you also agree that this should forcibly >

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 4:05 PM, Lluís wrote: > Another reason to have multiple variables, is that the insertion of NaNs to > maintain shape homogeneity will make these "synthetic" NaNs undistinguishable > from other NaNs that might be on your original input data, unless you use a > masked array or

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Bruce Southey
On Thu, Jul 8, 2010 at 5:09 PM, Robert Kern wrote: > On Thu, Jul 8, 2010 at 18:00, Bruce Southey wrote: >> On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer wrote: > Still, I have a question. Did you also agree that this should forcibly > index > through ticks? > >  arr.something[in

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
> I think we have to start from the nD case, even if I (and I think most > users) will tend to think in 2D.  The rest is just going to have to be > up to developers how they want users to interact with what we, the > developers, see as axes.  No end-user wants to think about the 6th > axis of the d

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:13 PM, Christoph Gohlke wrote: > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example: > > import numpy a = numpy.array([[1]]) > numpy.dot(a.astype('single'), a.astype('longdou

Re: [Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Hannes Bretschneider
Sebastian Haase gmail.com> writes: > > I would expect a 700MB text file translate into less than 200MB of > data - assuming that you are talking about decimal numbers (maybe > total of 10 digits each + spaces) and saving as float32 binary. > So the problem would "only" be the loading in - rathe

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Robert Kern
On Thu, Jul 8, 2010 at 18:00, Bruce Southey wrote: > On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer wrote: Still, I have a question. Did you also agree that this should forcibly index through ticks?  arr.something[int]      -> tick-based indexing >>> >>> Yes. >> >> I fee

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
> Then how is this not different than a record array? How is this the *same* as a record array? --Josh On Thu, Jul 8, 2010 at 2:03 PM, Rob Speer wrote: >> 3. That the  best solution to allow integer ticks while retaining >> 'normal' indexing semantics for integers would be to have >> >> arr[int

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
> 3. That the  best solution to allow integer ticks while retaining > 'normal' indexing semantics for integers would be to have > > arr[int] -> normal indexing > arr.somethin[int] -> tick-based indexing, where an int can mean anything. All right, it's clear lots of people like it better this way,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Bruce Southey
On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer wrote: >>> Still, I have a question. Did you also agree that this should forcibly index >>> through ticks? >>> >>>  arr.something[int]      -> tick-based indexing >>> >> >> Yes. > > I feel like people are talking about different things because it's > uncle

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 1:39 PM, Rob Speer wrote: >>> Still, I have a question. Did you also agree that this should forcibly index >>> through ticks? >>> >>>  arr.something[int]      -> tick-based indexing >>> >> >> Yes. > > I feel like people are talking about different things because it's > uncle

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
>> Still, I have a question. Did you also agree that this should forcibly index >> through ticks? >> >>  arr.something[int]      -> tick-based indexing >> > > Yes. I feel like people are talking about different things because it's unclear what the .something is. If the .something is an axis name,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 1:30 PM, Lluís wrote: > Joshua Holbrook writes: >>> arr[not int] -> tick-based indexing > >> At the BoF, we chose to drop this because we wanted to allow integer ticks >> (or >> implicit type conversion, either way) without the ambiguity of, "did we mean >> that in the ndar

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Joshua Holbrook writes: >> arr[not int] -> tick-based indexing > At the BoF, we chose to drop this because we wanted to allow integer ticks (or > implicit type conversion, either way) without the ambiguity of, "did we mean > that in the ndarray sense or in a "tick with the name '1'" sense? Sorry,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 12:55 PM, Lluís wrote: > Fernando Perez writes: >> The consensus at the  BoF (not that it means it's set in stone, simply >> that there was  good chance for back-and-forth on the topic with many >> voices) was that: > >> 1. There are valid use cases for 'integer ticks',  i.e

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 12:20 PM, Fernando Perez wrote: > On Thu, Jul 8, 2010 at 1:15 PM, Lluís wrote: >> >> >>> My impression from SciPy was that people would prefer separate >>> accessors for names and indices, especially because integers (a really >>> common data type, after all) shouldn't be f

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Fernando Perez writes: > The consensus at the BoF (not that it means it's set in stone, simply > that there was good chance for back-and-forth on the topic with many > voices) was that: > 1. There are valid use cases for 'integer ticks', i.e. integers that > index arbitrarily into an array ins

[Numpy-discussion] DataArray ticks

2010-07-08 Thread Keith Goodman
What do you think of adding a ticks parameter to DataArray? Would that make sense? Current behavior: >> x = DataArray([[1, 2], [3, 4]], (('row', ['A','B']), ('col', ['C', 'D']))) >> x.axes (Axis(label='row', index=0, ticks=['A', 'B']), Axis(label='col', index=1, ticks=['C', 'D'])) Proposed tick

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Fernando Perez
On Thu, Jul 8, 2010 at 1:15 PM, Lluís wrote: > > >> My impression from SciPy was that people would prefer separate >> accessors for names and indices, especially because integers (a really >> common data type, after all) shouldn't be forbidden. Also, working >> with strings of integers like '2010'

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: >> No. I'd rather go for eliminating the 'arr.year.named', and providing only: >>  * arr.__getitem__ >>  * arr.named.__getitem__ >>  * arr..__getitem__ >> >> The first being just the current ndarray.__getitem__, and the two last >> methods >> would accept both strings and integ

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Skipper Seabold writes: [...] >> If I understood well, you could have 4 axes (assuming that an Axis can only >> handle a single label/variable). >> >> a = DatArray(numpy.array([...], dtype = [("precipitation", float), >>                                         ("temperature", float)]), >>        

Re: [Numpy-discussion] [Numpy-svn] r8413 - trunk/numpy/lib - Author: oliphant - Add percentile function.

2010-07-08 Thread Keith Goodman
On Thu, Jul 8, 2010 at 12:27 PM, Sebastian Haase wrote: > isn't this related to > http://projects.scipy.org/numpy/ticket/626 > percentile() and clamp() > > which was set to invalid > > -Sebastian The new percentile function has an axis input. I like that. scipy.stats.scoreatpercentile always work

Re: [Numpy-discussion] [Numpy-svn] r8413 - trunk/numpy/lib - Author: oliphant - Add percentile function.

2010-07-08 Thread Sebastian Haase
isn't this related to http://projects.scipy.org/numpy/ticket/626 percentile() and clamp() which was set to invalid -Sebastian On Sun, May 16, 2010 at 12:11 AM, wrote: > Author: oliphant > Date: 2010-05-15 17:11:10 -0500 (Sat, 15 May 2010) > New Revision: 8413 > > Modified: >   trunk/numpy/lib

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
> No. I'd rather go for eliminating the 'arr.year.named', and providing only: >  * arr.__getitem__ >  * arr.named.__getitem__ >  * arr..__getitem__ > > The first being just the current ndarray.__getitem__, and the two last methods > would accept both strings and integers, assuming that names/ticks

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 2:41 PM, Rob Speer wrote: > On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold wrote: >> On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer wrote: >>> Your labels are unique if you look at them the right way. Here's how I >>> would represent that in a datarray: >>> * axis0 = 'city', [

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold wrote: > On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer wrote: >> Your labels are unique if you look at them the right way. Here's how I >> would represent that in a datarray: >> * axis0 = 'city', ['Austin', 'Boston', ...] >> * axis1 = 'month', ['January

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 1:38 PM, Lluís wrote: > Skipper Seabold writes: > >> On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer wrote: > [...] >>> My proposal is that datarray.row should be equivalent to >>> datarray.axes[0], and datarray.column should be equivalent to >>> datarray.axes[1], so that you ca

[Numpy-discussion] Fwd: effect of shape=None (the default) in format.open_memmap

2010-07-08 Thread David Goldsmith
No reply? -- Forwarded message -- From: David Goldsmith Date: Tue, Jul 6, 2010 at 7:03 PM Subject: effect of shape=None (the default) in format.open_memmap To: numpy-discussion@scipy.org Hi, I'm trying to wrap my brain around the affect of leaving shape=None (the default) in for

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer wrote: >> Forgive me if this is has already been addressed, but my question is >> what happens when we have more than one "label" (not as in a labeled >> axis but an observation label -- but not a tick because they're not >> unique!) per say row axis and h

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Wes McKinney
On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer wrote: >> Forgive me if this is has already been addressed, but my question is >> what happens when we have more than one "label" (not as in a labeled >> axis but an observation label -- but not a tick because they're not >> unique!) per say row axis and h

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: >> Or what I was striving for: >> >>   arr.year.named[1994:2010] >>   arr.year['1994':'2010'] >>   arr.year['1994':-3] > So your proposal is, whenever there's an index that is not an integer, > look it up by name, and use .named only if you want integer tick > names? This feels

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
>> But I don't understand your second example: >>>   arr.country['Spain'].year[1994:2010] > >> That seems to run straight into the index/name ambiguity. Shouldn't >> that take the 1994th through 2010th indices along the "year" axis? Not >> every axis will have names, so you can't make *all* the ind

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Skipper Seabold writes: > On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer wrote: [...] >> My proposal is that datarray.row should be equivalent to >> datarray.axes[0], and datarray.column should be equivalent to >> datarray.axes[1], so that you can always ask for something like >> "arr.column.named(20

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
> Forgive me if this is has already been addressed, but my question is > what happens when we have more than one "label" (not as in a labeled > axis but an observation label -- but not a tick because they're not > unique!) per say row axis and heterogenous dtypes.  This is really the > problem that

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: > On Thu, Jul 8, 2010 at 7:13 AM, Lluís wrote: >> Thus, we can use something in the middle: >> >>   arr[0,1] >>   arr.names['Netherlands',2010] # I'd rather go for 'names' instead of >> 'ticks' > Ah ha. So this is the case with positional axes but named ticks, which > we have

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Skipper Seabold
On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer wrote: >> While I haven't had a chance to really look in-depth at the changes >> myself (I'm a busy man! So many mailing lists!), I so far like the >> look and sound of them. That's just my opinion, though. > > If people are okay with the attribute magic,

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
> While I haven't had a chance to really look in-depth at the changes > myself (I'm a busy man! So many mailing lists!), I so far like the > look and sound of them. That's just my opinion, though. If people are okay with the attribute magic, I have a proposal for more of it. In my own project whe

Re: [Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Sebastian Haase
On Thu, Jul 8, 2010 at 4:46 PM, Bruce Southey wrote: > On 07/08/2010 08:52 AM, Wes McKinney wrote: >> On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider >>  wrote: >> >>> Dear NumPy developers, >>> >>> I have to process some big data files with high-frequency >>> financial data. I am trying to

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Joshua Holbrook writes: > On Thu, Jul 8, 2010 at 3:13 AM, Lluís wrote: >> Rob Speer writes: >> >> arr.country.named('Netherlands').year.named(2010) >> arr.country.named('Spain').year.named(slice(1994, 2010)) >> arr.year.named(2006).country[0:2] >> >> This looks too verbose to me. >>

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Rob Speer
On Thu, Jul 8, 2010 at 7:13 AM, Lluís wrote: > Thus, we can use something in the middle: > >   arr[0,1] >   arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' Ah ha. So this is the case with positional axes but named ticks, which we haven't really brought up yet. I'm def

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Thu, Jul 8, 2010 at 3:13 AM, Lluís wrote: > Rob Speer writes: > > arr.country.named('Netherlands').year.named(2010) > arr.country.named('Spain').year.named(slice(1994, 2010)) > arr.year.named(2006).country[0:2] > > This looks too verbose to me. > > As axis always have a total order,

Re: [Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Bruce Southey
On 07/08/2010 08:52 AM, Wes McKinney wrote: > On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider > wrote: > >> Dear NumPy developers, >> >> I have to process some big data files with high-frequency >> financial data. I am trying to load a delimited text file having >> ~700 MB with ~ 10 mill

Re: [Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Wes McKinney
On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider wrote: > Dear NumPy developers, > > I have to process some big data files with high-frequency > financial data. I am trying to load a delimited text file having > ~700 MB with ~ 10 million lines using numpy.genfromtxt(). The > machine is a Debia

[Numpy-discussion] Memory usage of numpy-arrays

2010-07-08 Thread Hannes Bretschneider
Dear NumPy developers, I have to process some big data files with high-frequency financial data. I am trying to load a delimited text file having ~700 MB with ~ 10 million lines using numpy.genfromtxt(). The machine is a Debian Lenny server 32bit with 3GB of memory. Since the file is just 700MB I

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Lluís
Rob Speer writes: arr.country.named('Netherlands').year.named(2010) arr.country.named('Spain').year.named(slice(1994, 2010)) arr.year.named(2006).country[0:2] This looks too verbose to me. As axis always have a total order, I'd go for the most compact representation (assuming 'cou

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:25 PM, David Cournapeau wrote: > On Thu, Jul 8, 2010 at 6:13 AM, Christoph Gohlke wrote: >> Dear NumPy developers, >> >> I am trying to solve some scipy.sparse TypeError failures reported in >> [1] and reduced them to the following example: >> >> > import numpy > a = numpy.

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:59 PM, Charles R Harris wrote: > > > On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke > wrote: > > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example:

Re: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot

2010-07-08 Thread Christoph Gohlke
On 7/7/2010 9:43 PM, Charles R Harris wrote: > > > On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke > wrote: > > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example:

Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-08 Thread Joshua Holbrook
On Wed, Jul 7, 2010 at 10:25 PM, Rob Speer wrote: > Glad I finally found this discussion. > > I implemented some of the ideas from the SciPy BOAF discussion, and > Joshua has already merged them into his datarray on GitHub (thanks, > Joshua, for being so fast on the merge button). > > To introduce