Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators
Actually, while working on https://github.com/numpy/numpy/issues/7264 I realized that the memory efficiency (one-pass) argument is simply incorrect: import numpy as np class A: def __getitem__(self, i): print("A get item", i) return [np.int8(1), np.int8(2)][i] def __len__(self): return 2 print(repr(np.array(A( This prints out A get item 0 A get item 1 A get item 2 A get item 0 A get item 1 A get item 2 A get item 0 A get item 1 A get item 2 array([1, 2], dtype=int8) i.e. the sequence is "turned into a concrete sequence" no less than 3 times. Antony 2016-01-19 11:33 GMT-08:00 Stephan Sahm : > just to not prevent it from the black hole - what about integrating > fromiter into array? (see the post by Benjamin Root) > > for me personally, taking the first element for deducing the dtype would > be a perfect default way to read generators. If one wants a specific other > dtype, one could specify it like in the current fromiter method. > > On 15 December 2015 at 08:08, Stephan Sahm wrote: > >> I would like to further push Benjamin Root's suggestion: >> >> "Therefore, I think it is not out of the realm of reason that passing a >> generator object and a dtype could then delegate the work under the hood to >> np.fromiter()? I would even go so far as to raise an error if one passes a >> generator without specifying dtype to np.array(). The point is to reduce >> the number of entry points for creating numpy arrays." >> >> would this be ok? >> >> On Mon, Dec 14, 2015 at 6:50 PM Robert Kern >> wrote: >> >>> On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root >>> wrote: >>> > >>> > Heh, never noticed that. Was it implemented more like a >>> generator/iterator in older versions of Python? >>> >>> No, it predates generators and iterators so it has always had to be >>> implemented like that. >>> >>> -- >>> Robert Kern >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
Mostly so that there is no performance lost when someone passes range(...) instead of np.arange(...). At least I had never realized that one is much faster than the other and always just passed range() as a convenience. Antony 2016-02-17 10:50 GMT-08:00 Chris Barker : > On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee > wrote: > >> So how can np.array(range(...)) even work? >> > > range() (in py3) is not a generator, nor is is a iterator. it is a range > object, which is lazily evaluated, and satisfies both the iterator protocol > and the sequence protocol (at least most of it: > > In [*1*]: r = range(10) > > > In [*2*]: r[3] > > Out[*2*]: 3 > > > In [*3*]: len(r) > > Out[*3*]: 10 > > > In [*4*]: type(r) > > Out[*4*]: range > > In [*9*]: isinstance(r, collections.abc.Sequence) > > Out[*9*]: True > > In [*10*]: l = list() > > In [*11*]: isinstance(l, collections.abc.Sequence) > > Out[*11*]: True > > In [*12*]: isinstance(r, collections.abc.Iterable) > > Out[*12*]: True > I'm still totally confused as to why we'd need to special-case range when > we have arange(). > > -CHB > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
On Thu, Feb 18, 2016 at 1:15 PM, Antony Lee wrote: > Mostly so that there is no performance lost when someone passes range(...) > instead of np.arange(...). At least I had never realized that one is much > faster than the other and always just passed range() as a convenience. > > Antony > > 2016-02-17 10:50 GMT-08:00 Chris Barker : > >> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee >> wrote: >> >>> So how can np.array(range(...)) even work? >>> >> >> range() (in py3) is not a generator, nor is is a iterator. it is a range >> object, which is lazily evaluated, and satisfies both the iterator protocol >> and the sequence protocol (at least most of it: >> >> In [*1*]: r = range(10) >> > thanks, I didn't know that the range r here doesn't get eaten by iterating through it while r = (i for i in range(5)) is only good for a single pass. (tried on python 3.4) Josef > >> In [*2*]: r[3] >> >> Out[*2*]: 3 >> >> >> In [*3*]: len(r) >> >> Out[*3*]: 10 >> >> >> In [*4*]: type(r) >> >> Out[*4*]: range >> >> In [*9*]: isinstance(r, collections.abc.Sequence) >> >> Out[*9*]: True >> >> In [*10*]: l = list() >> >> In [*11*]: isinstance(l, collections.abc.Sequence) >> >> Out[*11*]: True >> >> In [*12*]: isinstance(r, collections.abc.Iterable) >> >> Out[*12*]: True >> I'm still totally confused as to why we'd need to special-case range when >> we have arange(). >> >> -CHB >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R(206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> chris.bar...@noaa.gov >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
Some questions it'd be good to get feedback on: - any better ideas for naming it than "geomspace"? It's really too bad that the 'logspace' name is already taken. - I guess the alternative interface might be something like np.linspace(start, stop, steps, spacing="log") what do people think? -n On Wed, Feb 17, 2016 at 4:35 PM, . wrote: > I've suggested a new function similar to logspace, but where you specify the > start and stop points directly instead of using log(start) and base arguments: > > https://github.com/numpy/numpy/issues/7255 > https://github.com/numpy/numpy/pull/7268 > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith -- https://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith wrote: > > Some questions it'd be good to get feedback on: > > - any better ideas for naming it than "geomspace"? It's really too bad > that the 'logspace' name is already taken. geomspace() is a perfectly cromulent name, IMO. > - I guess the alternative interface might be something like > > np.linspace(start, stop, steps, spacing="log") > > what do people think? In a new function not named `linspace()`, I think that might be fine. I do occasionally want to swap between linear and logarithmic/geometric spacing based on a parameter, so this doesn't violate the van Rossum Rule of Function Signatures. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
I like the idea, as long as we all remain aware of the irony of having a "log" spacing for a function named "lin"space. -Joe On Thu, Feb 18, 2016 at 2:44 PM, Robert Kern wrote: > On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith wrote: >> >> Some questions it'd be good to get feedback on: >> >> - any better ideas for naming it than "geomspace"? It's really too bad >> that the 'logspace' name is already taken. > > geomspace() is a perfectly cromulent name, IMO. > >> - I guess the alternative interface might be something like >> >> np.linspace(start, stop, steps, spacing="log") >> >> what do people think? > > In a new function not named `linspace()`, I think that might be fine. I do > occasionally want to swap between linear and logarithmic/geometric spacing > based on a parameter, so this doesn't violate the van Rossum Rule of > Function Signatures. > > -- > Robert Kern > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On 2/18/2016 2:44 PM, Robert Kern wrote: In a new function not named `linspace()`, I think that might be fine. I do occasionally want to swap between linear and logarithmic/geometric spacing based on a parameter, so this doesn't violate the van Rossum Rule of Function Signatures. Would such a new function correct the apparent mistake (?) of `linspace` including the endpoint by default? Or is the current API justified by its Matlab origins? (Or have I missed the point altogether?) If this query is annoying, please ignore it. It is not meant to be. Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee wrote: > Mostly so that there is no performance lost when someone passes range(...) > instead of np.arange(...). At least I had never realized that one is much > faster than the other and always just passed range() as a convenience. > Well, pretty much everything in numpy is faster if you use the numpy array version rather than plain python -- this hardly seems like the extra code would be worth it. numpy's array() constructor can (and should) take an arbitrary iterable. It does make some sense that you we might want to special case iterators, as you don't want to loop through them too many times, which is what np.fromiter() is for. and _maybe_ it would be worth special casing python lists, as you can access items faster, and they are really, really common (or has this already been done?), but special casing range() is getting silly. And it might be hard to do. At the C level I suppose you could actually know what the parameters and state of the range object are and create an array directly from that -- but that's what arange is for... -CHB > 2016-02-17 10:50 GMT-08:00 Chris Barker : > >> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee >> wrote: >> >>> So how can np.array(range(...)) even work? >>> >> >> range() (in py3) is not a generator, nor is is a iterator. it is a range >> object, which is lazily evaluated, and satisfies both the iterator protocol >> and the sequence protocol (at least most of it: >> >> In [*1*]: r = range(10) >> >> >> In [*2*]: r[3] >> >> Out[*2*]: 3 >> >> >> In [*3*]: len(r) >> >> Out[*3*]: 10 >> >> >> In [*4*]: type(r) >> >> Out[*4*]: range >> >> In [*9*]: isinstance(r, collections.abc.Sequence) >> >> Out[*9*]: True >> >> In [*10*]: l = list() >> >> In [*11*]: isinstance(l, collections.abc.Sequence) >> >> Out[*11*]: True >> >> In [*12*]: isinstance(r, collections.abc.Iterable) >> >> Out[*12*]: True >> I'm still totally confused as to why we'd need to special-case range when >> we have arange(). >> >> -CHB >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R(206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> chris.bar...@noaa.gov >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On Thu, Feb 18, 2016 at 10:19 PM, Alan Isaac wrote: > > On 2/18/2016 2:44 PM, Robert Kern wrote: >> >> In a new function not named `linspace()`, I think that might be fine. I do occasionally want to swap between linear and logarithmic/geometric spacing based on a parameter, so this >> doesn't violate the van Rossum Rule of Function Signatures. > > Would such a new function correct the apparent mistake (?) of > `linspace` including the endpoint by default? > Or is the current API justified by its Matlab origins? > (Or have I missed the point altogether?) The last, I'm afraid. Different use cases, different conventions. Integer ranges are half-open because that is the most useful convention in a 0-indexed ecosystem. Floating point ranges don't interface with indexing, and the closed intervals are the most useful (or at least the most common). > If this query is annoying, please ignore it. It is not meant to be. The same for my answer. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On Thu, Feb 18, 2016 at 2:19 PM, Alan Isaac wrote: > Would such a new function correct the apparent mistake (?) of > `linspace` including the endpoint by default? > Or is the current API justified by its Matlab origins? > I don't think so -- we don't need no stinkin' Matlab ! But I LIKE including the endpoint in the sequence -- for the common use cases, it's often what you want, and if it didn't include the end point but you did want that, it would get pretty ugly to figure out how to get what you want. On the other hand, if I had it to do over, I would have the count specify the number of intervals, rather than the number of items. A common cae may be: values from zero to 10 (inclusive), and I want ten steps: In [19]: np.linspace(0, 10, 10) Out[19]: array([ 0., 1., 2., 3., 4., 5.5556, 6.6667, 7.7778, 8.8889, 10.]) HUH? I was expecting [0,1,2,3 ] (OK, not me, this isn't my first Rodeo), so now I need to do: In [20]: np.linspace(0, 10, 11) Out[20]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) This gets uglier if I know what "delta" I want: In [21]: start = 0.0; end = 9.0; delta = 1.0 In [24]: np.linspace(start, end, (end-start)/delta) Out[24]: array([ 0. , 1.125, 2.25 , 3.375, 4.5 , 5.625, 6.75 , 7.875, 9. ]) oops! In [25]: np.linspace(start, end, (end-start)/delta + 1) Out[25]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) But in any case, there is no changing it now. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
In a sense this discussion is really about making np.array(iterable) more efficient, so I restarted the discussion at https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075059.html Antony 2016-02-18 14:21 GMT-08:00 Chris Barker : > On Thu, Feb 18, 2016 at 10:15 AM, Antony Lee > wrote: > >> Mostly so that there is no performance lost when someone passes >> range(...) instead of np.arange(...). At least I had never realized that >> one is much faster than the other and always just passed range() as a >> convenience. >> > > Well, pretty much everything in numpy is faster if you use the numpy > array version rather than plain python -- this hardly seems like the extra > code would be worth it. > > numpy's array() constructor can (and should) take an arbitrary iterable. > > It does make some sense that you we might want to special case iterators, > as you don't want to loop through them too many times, which is what > np.fromiter() is for. > > and _maybe_ it would be worth special casing python lists, as you can > access items faster, and they are really, really common (or has this > already been done?), but special casing range() is getting silly. And it > might be hard to do. At the C level I suppose you could actually know what > the parameters and state of the range object are and create an array > directly from that -- but that's what arange is for... > > -CHB > > > >> 2016-02-17 10:50 GMT-08:00 Chris Barker : >> >>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee >>> wrote: >>> So how can np.array(range(...)) even work? >>> >>> range() (in py3) is not a generator, nor is is a iterator. it is a >>> range object, which is lazily evaluated, and satisfies both the iterator >>> protocol and the sequence protocol (at least most of it: >>> >>> In [*1*]: r = range(10) >>> >>> >>> In [*2*]: r[3] >>> >>> Out[*2*]: 3 >>> >>> >>> In [*3*]: len(r) >>> >>> Out[*3*]: 10 >>> >>> >>> In [*4*]: type(r) >>> >>> Out[*4*]: range >>> >>> In [*9*]: isinstance(r, collections.abc.Sequence) >>> >>> Out[*9*]: True >>> >>> In [*10*]: l = list() >>> >>> In [*11*]: isinstance(l, collections.abc.Sequence) >>> >>> Out[*11*]: True >>> >>> In [*12*]: isinstance(r, collections.abc.Iterable) >>> >>> Out[*12*]: True >>> I'm still totally confused as to why we'd need to special-case range >>> when we have arange(). >>> >>> -CHB >>> >>> >>> >>> -- >>> >>> Christopher Barker, Ph.D. >>> Oceanographer >>> >>> Emergency Response Division >>> NOAA/NOS/OR&R(206) 526-6959 voice >>> 7600 Sand Point Way NE (206) 526-6329 fax >>> Seattle, WA 98115 (206) 526-6317 main reception >>> >>> chris.bar...@noaa.gov >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
> > Some questions it'd be good to get feedback on: > > - any better ideas for naming it than "geomspace"? It's really too bad > that the 'logspace' name is already taken. > > - I guess the alternative interface might be something like > > np.linspace(start, stop, steps, spacing="log") > > what do people think? > > -n > You’ve got to wonder how many people actually use logspace(start, stop, num) in preference to 10.0**linspace(start, stop, num) - i.e. I prefer the latter for clarity, and if I wanted performance I’d be prepared to write something more ugly. I don’t mind geomspace(), but if you are brainstorming >>> linlogspace(start, end) # i.e. ‘linear in log-space’ is ok for me too. Peter ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion