Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2016-02-18 Thread Antony Lee
Actually, while working on https://github.com/numpy/numpy/issues/7264 I
realized that the memory efficiency (one-pass) argument is simply incorrect:

import numpy as np

class A:
def __getitem__(self, i):
print("A get item", i)
return [np.int8(1), np.int8(2)][i]
def __len__(self):
return 2

print(repr(np.array(A(

This prints out

A get item 0
A get item 1
A get item 2
A get item 0
A get item 1
A get item 2
A get item 0
A get item 1
A get item 2
array([1, 2], dtype=int8)

i.e. the sequence is "turned into a concrete sequence" no less than 3 times.

Antony

2016-01-19 11:33 GMT-08:00 Stephan Sahm :

> just to not prevent it from the black hole - what about integrating
> fromiter into array? (see the post by Benjamin Root)
>
> for me personally, taking the first element for deducing the dtype would
> be a perfect default way to read generators. If one wants a specific other
> dtype, one could specify it like in the current fromiter method.
>
> On 15 December 2015 at 08:08, Stephan Sahm  wrote:
>
>> I would like to further push Benjamin Root's suggestion:
>>
>> "Therefore, I think it is not out of the realm of reason that passing a
>> generator object and a dtype could then delegate the work under the hood to
>> np.fromiter()? I would even go so far as to raise an error if one passes a
>> generator without specifying dtype to np.array(). The point is to reduce
>> the number of entry points for creating numpy arrays."
>>
>> would this be ok?
>>
>> On Mon, Dec 14, 2015 at 6:50 PM Robert Kern 
>> wrote:
>>
>>> On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root 
>>> wrote:
>>> >
>>> > Heh, never noticed that. Was it implemented more like a
>>> generator/iterator in older versions of Python?
>>>
>>> No, it predates generators and iterators so it has always had to be
>>> implemented like that.
>>>
>>> --
>>> Robert Kern
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread Antony Lee
Mostly so that there is no performance lost when someone passes range(...)
instead of np.arange(...).  At least I had never realized that one is much
faster than the other and always just passed range() as a convenience.

Antony

2016-02-17 10:50 GMT-08:00 Chris Barker :

> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee 
> wrote:
>
>> So how can np.array(range(...)) even work?
>>
>
> range()  (in py3) is not a generator, nor is is a iterator. it is a range
> object, which is lazily evaluated, and satisfies both the iterator protocol
> and the sequence protocol (at least most of it:
>
> In [*1*]: r = range(10)
>
>
> In [*2*]: r[3]
>
> Out[*2*]: 3
>
>
> In [*3*]: len(r)
>
> Out[*3*]: 10
>
>
> In [*4*]: type(r)
>
> Out[*4*]: range
>
> In [*9*]: isinstance(r, collections.abc.Sequence)
>
> Out[*9*]: True
>
> In [*10*]: l = list()
>
> In [*11*]: isinstance(l, collections.abc.Sequence)
>
> Out[*11*]: True
>
> In [*12*]: isinstance(r, collections.abc.Iterable)
>
> Out[*12*]: True
> I'm still totally confused as to why we'd need to special-case range when
> we have arange().
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread josef.pktd
On Thu, Feb 18, 2016 at 1:15 PM, Antony Lee  wrote:

> Mostly so that there is no performance lost when someone passes range(...)
> instead of np.arange(...).  At least I had never realized that one is much
> faster than the other and always just passed range() as a convenience.
>
> Antony
>
> 2016-02-17 10:50 GMT-08:00 Chris Barker :
>
>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee 
>> wrote:
>>
>>> So how can np.array(range(...)) even work?
>>>
>>
>> range()  (in py3) is not a generator, nor is is a iterator. it is a range
>> object, which is lazily evaluated, and satisfies both the iterator protocol
>> and the sequence protocol (at least most of it:
>>
>> In [*1*]: r = range(10)
>>
>
thanks, I didn't know that

the range r here doesn't get eaten by iterating through it
while
r = (i for i in range(5))
is only good for a single pass.

(tried on python 3.4)

Josef



>
>> In [*2*]: r[3]
>>
>> Out[*2*]: 3
>>
>>
>> In [*3*]: len(r)
>>
>> Out[*3*]: 10
>>
>>
>> In [*4*]: type(r)
>>
>> Out[*4*]: range
>>
>> In [*9*]: isinstance(r, collections.abc.Sequence)
>>
>> Out[*9*]: True
>>
>> In [*10*]: l = list()
>>
>> In [*11*]: isinstance(l, collections.abc.Sequence)
>>
>> Out[*11*]: True
>>
>> In [*12*]: isinstance(r, collections.abc.Iterable)
>>
>> Out[*12*]: True
>> I'm still totally confused as to why we'd need to special-case range when
>> we have arange().
>>
>> -CHB
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Nathaniel Smith
Some questions it'd be good to get feedback on:

- any better ideas for naming it than "geomspace"? It's really too bad
that the 'logspace' name is already taken.

- I guess the alternative interface might be something like

np.linspace(start, stop, steps, spacing="log")

what do people think?

-n

On Wed, Feb 17, 2016 at 4:35 PM, .  wrote:
> I've suggested a new function similar to logspace, but where you specify the 
> start and stop points directly instead of using log(start) and base arguments:
>
> https://github.com/numpy/numpy/issues/7255
> https://github.com/numpy/numpy/pull/7268
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Robert Kern
On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith  wrote:
>
> Some questions it'd be good to get feedback on:
>
> - any better ideas for naming it than "geomspace"? It's really too bad
> that the 'logspace' name is already taken.

geomspace() is a perfectly cromulent name, IMO.

> - I guess the alternative interface might be something like
>
> np.linspace(start, stop, steps, spacing="log")
>
> what do people think?

In a new function not named `linspace()`, I think that might be fine. I do
occasionally want to swap between linear and logarithmic/geometric spacing
based on a parameter, so this doesn't violate the van Rossum Rule of
Function Signatures.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Joseph Fox-Rabinovitz
I like the idea, as long as we all remain aware of the irony of having
a "log" spacing for a function named "lin"space.

-Joe

On Thu, Feb 18, 2016 at 2:44 PM, Robert Kern  wrote:
> On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith  wrote:
>>
>> Some questions it'd be good to get feedback on:
>>
>> - any better ideas for naming it than "geomspace"? It's really too bad
>> that the 'logspace' name is already taken.
>
> geomspace() is a perfectly cromulent name, IMO.
>
>> - I guess the alternative interface might be something like
>>
>> np.linspace(start, stop, steps, spacing="log")
>>
>> what do people think?
>
> In a new function not named `linspace()`, I think that might be fine. I do
> occasionally want to swap between linear and logarithmic/geometric spacing
> based on a parameter, so this doesn't violate the van Rossum Rule of
> Function Signatures.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Alan Isaac

On 2/18/2016 2:44 PM, Robert Kern wrote:

In a new function not named `linspace()`, I think that might be fine. I do 
occasionally want to swap between linear and logarithmic/geometric spacing 
based on a parameter, so this
doesn't violate the van Rossum Rule of Function Signatures.



Would such a new function correct the apparent mistake (?) of
`linspace` including the endpoint by default?
Or is the current API justified by its Matlab origins?
(Or have I missed the point altogether?)

If this query is annoying, please ignore it.  It is not meant to be.

Alan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread Chris Barker
On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee 
wrote:

> Mostly so that there is no performance lost when someone passes range(...)
> instead of np.arange(...).  At least I had never realized that one is much
> faster than the other and always just passed range() as a convenience.
>

Well,  pretty much everything in numpy is faster if you use the numpy array
version rather than plain python -- this hardly seems like the extra code
would be worth it.

numpy's array() constructor can (and should) take an arbitrary iterable.

It does make some sense that you we might want to special case iterators,
as you don't want to loop through them too many times, which is what
np.fromiter() is for.

and _maybe_ it would be worth special casing python lists, as you can
access items faster, and they are really, really common (or has this
already been done?), but special casing range() is getting silly. And it
might be hard to do. At the C level I suppose you could actually know what
the parameters and state of the range object are and create an array
directly from that -- but that's what arange is for...

-CHB



> 2016-02-17 10:50 GMT-08:00 Chris Barker :
>
>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee 
>> wrote:
>>
>>> So how can np.array(range(...)) even work?
>>>
>>
>> range()  (in py3) is not a generator, nor is is a iterator. it is a range
>> object, which is lazily evaluated, and satisfies both the iterator protocol
>> and the sequence protocol (at least most of it:
>>
>> In [*1*]: r = range(10)
>>
>>
>> In [*2*]: r[3]
>>
>> Out[*2*]: 3
>>
>>
>> In [*3*]: len(r)
>>
>> Out[*3*]: 10
>>
>>
>> In [*4*]: type(r)
>>
>> Out[*4*]: range
>>
>> In [*9*]: isinstance(r, collections.abc.Sequence)
>>
>> Out[*9*]: True
>>
>> In [*10*]: l = list()
>>
>> In [*11*]: isinstance(l, collections.abc.Sequence)
>>
>> Out[*11*]: True
>>
>> In [*12*]: isinstance(r, collections.abc.Iterable)
>>
>> Out[*12*]: True
>> I'm still totally confused as to why we'd need to special-case range when
>> we have arange().
>>
>> -CHB
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Robert Kern
On Thu, Feb 18, 2016 at 10:19 PM, Alan Isaac  wrote:
>
> On 2/18/2016 2:44 PM, Robert Kern wrote:
>>
>> In a new function not named `linspace()`, I think that might be fine. I
do occasionally want to swap between linear and logarithmic/geometric
spacing based on a parameter, so this
>> doesn't violate the van Rossum Rule of Function Signatures.
>
> Would such a new function correct the apparent mistake (?) of
> `linspace` including the endpoint by default?
> Or is the current API justified by its Matlab origins?
> (Or have I missed the point altogether?)

The last, I'm afraid. Different use cases, different conventions. Integer
ranges are half-open because that is the most useful convention in a
0-indexed ecosystem. Floating point ranges don't interface with indexing,
and the closed intervals are the most useful (or at least the most common).

> If this query is annoying, please ignore it.  It is not meant to be.

The same for my answer.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Chris Barker
On Thu, Feb 18, 2016 at 2:19 PM, Alan Isaac  wrote:

> Would such a new function correct the apparent mistake (?) of
> `linspace` including the endpoint by default?
> Or is the current API justified by its Matlab origins?
>

I don't think so -- we don't need no stinkin' Matlab !

But I LIKE including the endpoint in the sequence -- for the common use
cases, it's often what you want, and if it didn't include the end point but
you did want that, it would get pretty ugly to figure out how to get what
you want.

On the other hand, if I had it to do over, I would have the count specify
the number of intervals, rather than the number of items. A common cae may
be: values from zero to 10 (inclusive), and I want ten steps:

In [19]: np.linspace(0, 10, 10)
Out[19]:

array([  0.,   1.,   2.,   3.,
 4.,   5.5556,   6.6667,   7.7778,
 8.8889,  10.])

HUH? I was expecting [0,1,2,3 ] (OK, not me, this isn't my first
Rodeo), so now I need to do:
In [20]: np.linspace(0, 10, 11)
Out[20]: array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,
 10.])

This gets uglier if I know what "delta" I want:

In [21]: start = 0.0; end = 9.0; delta = 1.0
In [24]: np.linspace(start, end, (end-start)/delta)
Out[24]: array([ 0.   ,  1.125,  2.25 ,  3.375,  4.5  ,  5.625,  6.75 ,
 7.875,  9.   ])

oops!

In [25]: np.linspace(start, end, (end-start)/delta + 1)
Out[25]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

But in any case, there is no changing it now.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread Antony Lee
In a sense this discussion is really about making np.array(iterable) more
efficient, so I restarted the discussion at
https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075059.html

Antony

2016-02-18 14:21 GMT-08:00 Chris Barker :

> On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee 
> wrote:
>
>> Mostly so that there is no performance lost when someone passes
>> range(...) instead of np.arange(...).  At least I had never realized that
>> one is much faster than the other and always just passed range() as a
>> convenience.
>>
>
> Well,  pretty much everything in numpy is faster if you use the numpy
> array version rather than plain python -- this hardly seems like the extra
> code would be worth it.
>
> numpy's array() constructor can (and should) take an arbitrary iterable.
>
> It does make some sense that you we might want to special case iterators,
> as you don't want to loop through them too many times, which is what
> np.fromiter() is for.
>
> and _maybe_ it would be worth special casing python lists, as you can
> access items faster, and they are really, really common (or has this
> already been done?), but special casing range() is getting silly. And it
> might be hard to do. At the C level I suppose you could actually know what
> the parameters and state of the range object are and create an array
> directly from that -- but that's what arange is for...
>
> -CHB
>
>
>
>> 2016-02-17 10:50 GMT-08:00 Chris Barker :
>>
>>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee 
>>> wrote:
>>>
 So how can np.array(range(...)) even work?

>>>
>>> range()  (in py3) is not a generator, nor is is a iterator. it is a
>>> range object, which is lazily evaluated, and satisfies both the iterator
>>> protocol and the sequence protocol (at least most of it:
>>>
>>> In [*1*]: r = range(10)
>>>
>>>
>>> In [*2*]: r[3]
>>>
>>> Out[*2*]: 3
>>>
>>>
>>> In [*3*]: len(r)
>>>
>>> Out[*3*]: 10
>>>
>>>
>>> In [*4*]: type(r)
>>>
>>> Out[*4*]: range
>>>
>>> In [*9*]: isinstance(r, collections.abc.Sequence)
>>>
>>> Out[*9*]: True
>>>
>>> In [*10*]: l = list()
>>>
>>> In [*11*]: isinstance(l, collections.abc.Sequence)
>>>
>>> Out[*11*]: True
>>>
>>> In [*12*]: isinstance(r, collections.abc.Iterable)
>>>
>>> Out[*12*]: True
>>> I'm still totally confused as to why we'd need to special-case range
>>> when we have arange().
>>>
>>> -CHB
>>>
>>>
>>>
>>> --
>>>
>>> Christopher Barker, Ph.D.
>>> Oceanographer
>>>
>>> Emergency Response Division
>>> NOAA/NOS/OR&R(206) 526-6959   voice
>>> 7600 Sand Point Way NE   (206) 526-6329   fax
>>> Seattle, WA  98115   (206) 526-6317   main reception
>>>
>>> chris.bar...@noaa.gov
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Peter Creasey
>
> Some questions it'd be good to get feedback on:
>
> - any better ideas for naming it than "geomspace"? It's really too bad
> that the 'logspace' name is already taken.
>
> - I guess the alternative interface might be something like
>
> np.linspace(start, stop, steps, spacing="log")
>
> what do people think?
>
> -n
>
You’ve got to wonder how many people actually use logspace(start,
stop, num) in preference to 10.0**linspace(start, stop, num) - i.e. I
prefer the latter for clarity, and if I wanted performance I’d be
prepared to write something more ugly.

I don’t mind geomspace(), but if you are brainstorming
>>> linlogspace(start, end) # i.e. ‘linear in log-space’
is ok for me too.

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion