[Numpy-discussion] NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-08-21 Thread Nathan
Hello all,

I just opened a pull request to add NEP 55, see
https://github.com/numpy/numpy/pull/24483.

Per NEP 0, I've copied everything up to the "detailed description" section
below.

I'm looking forward to your feedback on this.

-Nathan Goldbaum

=
NEP 55 — Add a UTF-8 Variable-Width String DType to NumPy
=====

:Author: Nathan Goldbaum 
:Status: Draft
:Type: Standards Track
:Created: 2023-06-29


Abstract


We propose adding a new string data type to NumPy where each item in the
array
is an arbitrary length UTF-8 encoded string. This will enable performance,
memory usage, and usability improvements for NumPy users, including:

* Memory savings for workflows that currently use fixed-width strings and
store
primarily ASCII data or a mix of short and long strings in a single NumPy
array.

* Downstream libraries and users will be able to move away from object
arrays
currently used as a substitute for variable-length string arrays, unlocking
performance improvements by avoiding passes over the data outside of NumPy.

* A more intuitive user-facing API for working with arrays of Python
strings,
without a need to think about the in-memory array representation.

Motivation and Scope


First, we will describe how the current state of support for string or
string-like data in NumPy arose. Next, we will summarize the last major
previous
discussion about this topic. Finally, we will describe the scope of the
proposed
changes to NumPy as well as changes that are explicitly out of scope of this
proposal.

History of String Support in Numpy
**

Support in NumPy for textual data evolved organically in response to early
user
needs and then changes in the Python ecosystem.

Support for strings was added to numpy to support users of the NumArray
``chararray`` type. Remnants of this are still visible in the NumPy API:
string-related functionality lives in ``np.char``, to support the obsolete
``np.char.chararray`` class, deprecated since NumPy 1.4 in favor of string
DTypes.

NumPy's ``bytes_`` DType was originally used to represent the Python 2 ``str
``
type before Python 3 support was added to NumPy. The bytes DType makes the
most
sense when it is used to represent Python 2 strings or other null-terminated
byte sequences. However, ignoring data after the first null character means
the
``bytes_`` DType is only suitable for bytestreams that do not contain
nulls, so
it is a poor match for generic bytestreams.

The ``unicode`` DType was added to support the Python 2 ``unicode`` type. It
stores data in 32-bit UCS-4 codepoints (e.g. a UTF-32 encoding), which
makes for
a straightforward implementation, but is inefficient for storing text that
can
be represented well using a one-byte ASCII or Latin-1 encoding. This was
not a
problem in Python 2, where ASCII or mostly-ASCII text could use the Python 2
``str`` DType (the current ``bytes_`` DType).

With the arrival of Python 3 support in NumPy, the string DTypes were
largely
left alone due to backward compatibility concerns, although the unicode
DType
became the default DType for ``str`` data and the old ``string`` DType was
renamed the ``bytes_`` DType. This change left NumPy with the sub-optimal
situation of shipping a data type originally intended for null-terminated
bytestrings as the data type for *all* python ``bytes`` data, and a default
string type with an in-memory representation that consumes four times as
much
memory as needed for ASCII or mostly-ASCII data.

Problems with Fixed-Width Strings
*

Both existing string DTypes represent fixed-width sequences, allowing
storage of
the string data in the array buffer. This avoids adding out-of-band storage
to
NumPy, however, it makes for an awkward user interface. In particular, the
maximum string size must be inferred by NumPy or estimated by the user
before
loading the data into a NumPy array or selecting an output DType for string
operations. In the worst case, this requires an expensive pass over the full
dataset to calculate the maximum length of an array element. It also wastes
memory when array elements have varying lengths. Pathological cases where an
array stores many short strings and a few very long strings are
particularly bad
for wasting memory.

Downstream usage of string data in NumPy arrays has proven out the need for
a
variable-width string data type. In practice, most downstream users employ
``object`` arrays for this purpose. In particular, ``pandas`` has explicitly
deprecated support for NumPy fixed-width strings, coerces NumPy fixed-width
string arrays to ``object`` arrays, and in the future may switch to only
supporting string data via ``PyArrow``, which has native support for UTF-8
encoded variable-width string arrays [1]_. This is unfortunate, since ``
object``
array

[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-08-29 Thread Nathan
The NEP was merged in draft form, see below.

https://numpy.org/neps/nep-0055-string_dtype.html

On Mon, Aug 21, 2023 at 2:36 PM Nathan  wrote:

> Hello all,
>
> I just opened a pull request to add NEP 55, see
> https://github.com/numpy/numpy/pull/24483.
>
> Per NEP 0, I've copied everything up to the "detailed description" section
> below.
>
> I'm looking forward to your feedback on this.
>
> -Nathan Goldbaum
>
> =
> NEP 55 — Add a UTF-8 Variable-Width String DType to NumPy
> =====
>
> :Author: Nathan Goldbaum 
> :Status: Draft
> :Type: Standards Track
> :Created: 2023-06-29
>
>
> Abstract
> 
>
> We propose adding a new string data type to NumPy where each item in the
> array
> is an arbitrary length UTF-8 encoded string. This will enable performance,
> memory usage, and usability improvements for NumPy users, including:
>
> * Memory savings for workflows that currently use fixed-width strings and
> store
> primarily ASCII data or a mix of short and long strings in a single NumPy
> array.
>
> * Downstream libraries and users will be able to move away from object
> arrays
> currently used as a substitute for variable-length string arrays, unlocking
> performance improvements by avoiding passes over the data outside of NumPy.
>
> * A more intuitive user-facing API for working with arrays of Python
> strings,
> without a need to think about the in-memory array representation.
>
> Motivation and Scope
> 
>
> First, we will describe how the current state of support for string or
> string-like data in NumPy arose. Next, we will summarize the last major
> previous
> discussion about this topic. Finally, we will describe the scope of the
> proposed
> changes to NumPy as well as changes that are explicitly out of scope of
> this
> proposal.
>
> History of String Support in Numpy
> **
>
> Support in NumPy for textual data evolved organically in response to early
> user
> needs and then changes in the Python ecosystem.
>
> Support for strings was added to numpy to support users of the NumArray
> ``chararray`` type. Remnants of this are still visible in the NumPy API:
> string-related functionality lives in ``np.char``, to support the obsolete
> ``np.char.chararray`` class, deprecated since NumPy 1.4 in favor of string
> DTypes.
>
> NumPy's ``bytes_`` DType was originally used to represent the Python 2 ``
> str``
> type before Python 3 support was added to NumPy. The bytes DType makes the
> most
> sense when it is used to represent Python 2 strings or other
> null-terminated
> byte sequences. However, ignoring data after the first null character
> means the
> ``bytes_`` DType is only suitable for bytestreams that do not contain
> nulls, so
> it is a poor match for generic bytestreams.
>
> The ``unicode`` DType was added to support the Python 2 ``unicode`` type.
> It
> stores data in 32-bit UCS-4 codepoints (e.g. a UTF-32 encoding), which
> makes for
> a straightforward implementation, but is inefficient for storing text that
> can
> be represented well using a one-byte ASCII or Latin-1 encoding. This was
> not a
> problem in Python 2, where ASCII or mostly-ASCII text could use the Python
> 2
> ``str`` DType (the current ``bytes_`` DType).
>
> With the arrival of Python 3 support in NumPy, the string DTypes were
> largely
> left alone due to backward compatibility concerns, although the unicode
> DType
> became the default DType for ``str`` data and the old ``string`` DType was
> renamed the ``bytes_`` DType. This change left NumPy with the sub-optimal
> situation of shipping a data type originally intended for null-terminated
> bytestrings as the data type for *all* python ``bytes`` data, and a
> default
> string type with an in-memory representation that consumes four times as
> much
> memory as needed for ASCII or mostly-ASCII data.
>
> Problems with Fixed-Width Strings
> *
>
> Both existing string DTypes represent fixed-width sequences, allowing
> storage of
> the string data in the array buffer. This avoids adding out-of-band
> storage to
> NumPy, however, it makes for an awkward user interface. In particular, the
> maximum string size must be inferred by NumPy or estimated by the user
> before
> loading the data into a NumPy array or selecting an output DType for string
> operations. In the worst case, this requires an expensive pass over the
> full
> dataset to calculate the maximum length of an array element. It also wastes
>

[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-11 Thread Nathan
On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser 
wrote:

>
>
> On Tue, Aug 29, 2023 at 10:09 AM Nathan  wrote:
> >
> > The NEP was merged in draft form, see below.
> >
> > https://numpy.org/neps/nep-0055-string_dtype.html
> >
> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
> wrote:
> >>
> >> Hello all,
> >>
> >> I just opened a pull request to add NEP 55, see
> https://github.com/numpy/numpy/pull/24483.
> >>
> >> Per NEP 0, I've copied everything up to the "detailed description"
> section below.
> >>
> >> I'm looking forward to your feedback on this.
> >>
> >> -Nathan Goldbaum
> >>
>
> This will be a nice addition to NumPy, and matches a suggestion by
> @rkern (and probably others) made in the 2017 mailing list thread;
> see the last bullet of
>
>  https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
>
> So +1 for the enhancement!
>
> Now for some nitty-gritty review...
>

Thanks for the nitty-gritty review! I was on vacation last week and haven't
had a chance to look over this in detail yet, but at first glance this
seems like a really nice improvement.

I'm going to try to integrate your proposed design into the dtype prototype
this week. If that works, I'd like to include some of the text from the
README in your repo in the NEP and add you as an author, would that be
alright?


>
> There is a design change that I think should be made in the
> implementation of missing values.
>
> In the current design described in the NEP, and expanded on in the
> comment
>
> https://github.com/numpy/numpy/pull/24483#discussion_r1311815944,
>
> the meaning of the values `{len = 0, buf = NULL}` in an instance of
> `npy_static_string` depends on whether or not the `na_object` has been
> set in the dtype. If it has not been set, that data represents a string
> of length 0. If `na_object` *has* been set, that data represents a
> missing value. To get a string of length 0 in this case, some non-NULL
> value must be assigned to the `buf` field. (In the comment linked
> above, @ngoldbaum suggested `{0, "\0"}`, but strings are not
> NUL-terminated, so there is no need for that `\0` in `buf`, and in fact,
> with `len == 0`, it would be a bug for the pointer to be dereferenced,
> so *any* non-NULL value--valid pointer or not--could be used for `buf`.)
>
> I think it would be better if `len == 0` *always* meant a string with
> length 0, with no additional qualifications; it shouldn't be necessary
> to put some non-NULL value in `buf` just to get an empty string. We
> can achieve this if we use a bit in `len` as a flag for a missing value.
> Reserving a bit from `len` as a flag reduces the maximum possible string
> length, but as discussed in the NEP pull request, we're almost certainly
> going to reserve at least the high bit of `len` when small string
> optimization (SSO) is implemented. This will reduce the maximum string
> length to `2**(N-1)-1`, where `N` is the bit width of `size_t`
> (equivalent to using a signed type for `len`). Even if SSO isn't
> implemented immediately, we can anticipate the need for flags stored
> in `len`, and use them to implement missing values.
>
> The actual implementation of SSO will require some more design work,
> because the offset of the most significant byte of `len` within the
> `npy_static_string` struct depends on the platform endianess. For
> little-endian, the most significant byte is not the first byte in the
> struct, so the bytes available for SSO within the struct are not
> contiguous when the fields have the order `{len, buf}`.
>
> I experimented with these ideas, and put the result at
>
> https://github.com/WarrenWeckesser/experiments/tree/master/c/numpy-vstring
>
> The idea that I propose there is to make the memory layout of the
> struct depend on the endianess of the platform, so the most
> significant byte of `len` (which I called `size`, to avoid any chance
> of confusion with the actual length of the string [1]) is at the
> beginning of the struct on big-endian platforms and at the end of the
> struct for little-endian platforms. More details are included in the
> file README.md. Note that I am not suggesting that all the SSO stuff
> be included in the current NEP! This is just a proof-of-concept that
> shows one possibility for SSO.
>
> In that design, the high bit of `size` (which is `len` here) being set
> indicates that the `npy_static_string` struct should not be interpreted
> as the standard `{len, buf}` representation of a string. When the
> second highest bit is set, it means we have a missing value. If the
> second highest bit is no

[Numpy-discussion] Re: Curious performance different with np.unique on arrays of characters

2023-09-14 Thread Nathan
Looking at a py-spy profile of a slightly modified version of the code you
shared, it seems the difference comes down to NumPy's sorting
implementation simply being faster for ints than unicode strings. In
particular, it looks like string_quicksort_ is two
or three times slower than quicksort_ when passed the
same data.

We could probably add a special case in the sorting code to improve
performance for sorting single-character arrays. I have no idea if that
would be complicated or would make the code difficult to deal with. I'll
also note that string sorting is a more general problem than integer
sorting, since a generic string sort can't assume that it is handed
single-character strings.

Note also that U1 arrays are arrays of a single *unicode* character, which
in NumPy is stored as a 4-byte codepoint. If all you care about is ASCII or
Latin-1 characters, an S1 encoding will be a bit faster. On my machine,
using S1 makes unique_basic(charlist_10k) go from 466 us to 400 us.
However, I can also rewrite unique_view in that case to cast to int8, which
takes the runtime for unique_view(charlist_10k) from 172 us to 155 us.


On Thu, Sep 14, 2023 at 8:10 AM  wrote:

> Hello -
>
> In the course of some genomics simulations, I seem to have come across a
> curious (to me at least) performance difference in np.unique that I wanted
> to share. (If this is not the right forum for this, please let me know!)
>
> With a np.array of characters (U1), np.unique seems to be much faster when
> doing np.view as int -> np.unique -> np.view as U1 for arrays of decent
> size. I would not have expected this since np.unique knows what's coming in
> as S1 and could handle the view-stuff internally. I've played with this a
> number of ways (e.g. S1 vs U1; int32 vs int64; return_counts = True vs
> False; 100, 1000, or 10k elements) and seem to notice the same pattern. A
> short illustration below with U1, int32, return_counts = False, 10 vs 10k.
>
> I wonder if this is actually intended behavior, i.e. the view-stuff is
> actually a good idea for the user to think about and implement if
> appropriate for their usecase (as it is for me).
>
> Best regards,
> Shyam
>
>
> import numpy as np
>
> charlist_10 = np.array(list('ASDFGHJKLZ'), dtype='U1')
> charlist_10k = np.array(list('ASDFGHJKLZ' * 1000), dtype='U1')
>
> def unique_basic(x):
> return np.unique(x)
>
> def unique_view(x):
> return np.unique(x.view(np.int32)).view(x.dtype)
>
> In [27]: %timeit unique_basic(charlist_10)
> 2.17 µs ± 40.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>
> In [28]: %timeit unique_view(charlist_10)
> 2.53 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>
> In [29]: %timeit unique_basic(charlist_10k)
> 204 µs ± 4.61 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>
> In [30]: %timeit unique_view(charlist_10k)
> 66.7 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> In [31]: np.__version__
> Out[31]: '1.25.2'
>
>
>
> --
> Shyam Saladi
> https://shyam.saladi.org
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-20 Thread Nathan
On Wed, Sep 20, 2023 at 12:26 AM Warren Weckesser <
warren.weckes...@gmail.com> wrote:

>
>
> On Fri, Sep 15, 2023 at 3:18 PM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
> >
> >
> >
> > On Mon, Sep 11, 2023 at 12:25 PM Nathan 
> wrote:
> >>
> >>
> >>
> >> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
> wrote:
> >>> >
> >>> > The NEP was merged in draft form, see below.
> >>> >
> >>> > https://numpy.org/neps/nep-0055-string_dtype.html
> >>> >
> >>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
> wrote:
> >>> >>
> >>> >> Hello all,
> >>> >>
> >>> >> I just opened a pull request to add NEP 55, see
> https://github.com/numpy/numpy/pull/24483.
> >>> >>
> >>> >> Per NEP 0, I've copied everything up to the "detailed description"
> section below.
> >>> >>
> >>> >> I'm looking forward to your feedback on this.
> >>> >>
> >>> >> -Nathan Goldbaum
> >>> >>
> >>>
> >>> This will be a nice addition to NumPy, and matches a suggestion by
> >>> @rkern (and probably others) made in the 2017 mailing list thread;
> >>> see the last bullet of
> >>>
> >>>
> https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
> >>>
> >>> So +1 for the enhancement!
> >>>
> >>> Now for some nitty-gritty review...
> >>
> >>
> >> Thanks for the nitty-gritty review! I was on vacation last week and
> haven't had a chance to look over this in detail yet, but at first glance
> this seems like a really nice improvement.
> >>
> >> I'm going to try to integrate your proposed design into the dtype
> prototype this week. If that works, I'd like to include some of the text
> from the README in your repo in the NEP and add you as an author, would
> that be alright?
> >
> >
> >
> > Sure, that would be fine.
> >
> > I have a few more comments and questions about the NEP that I'll finish
> up and send this weekend.
> >
>
> One more comment on the NEP...
>
> My first impression of the missing data API design is that
> it is more complicated than necessary. An alternative that
> is simpler--and is consistent with the pattern established for
> floats and datetimes--is to define a "not a string" value, say
> `np.nastring` or something similar, just like we have `nan` for
> floats and `nat` for datetimes. Its behavior could be what
> you called "nan-like".
>
> The handling of `np.nastring` would be an intrinsic part of the
> dtype, so there would be no need for the `na_object` parameter
> of `StringDType`. All `StringDType`s would handle `np.nastring`
> in the same consistent manner.
>
> The use-case for the string sentinel does not seem very
> compelling (but maybe I just don't understand the use-cases).
> If there is a real need here that is not covered by
> `np.nastring`, perhaps just a flag to control the repr of
> `np.nastring` for each StringDType instance would be enough?
>
> If there is an objection to a potential proliferation of
> "not a thing" special values, one for each type that can
> handle them, then perhaps a generic "not a value" (say
> `np.navalue`) could be created that, when assigned to an
> element of an array, results in the appropriate "not a thing"
> value actually being assigned. In a sense, I guess this NEP is
> proposing that, but it is reusing the floating point object
> `np.nan` as the generic "not a thing" value, and my preference
> is that, *if* we go with such a generic object, it is not
> the floating point value `nan` but a new thing with a name
> that reflects its purpose. (I guess Pandas users might be
> accustomed to `nan` being a generic sentinel for missing data,
> so its use doesn't feel as incohesive as it might to others.
> Passing a string array to `np.isnan()` just feels *wrong* to
> me.)
>
> Any, that's my 2¢.
>
> Warren
>
>
In addition to Ralf's points, I don't think it's possible for NumPy to
support all downstream usages of object string arrays without something
like what's in the NEP. Some downstream libraries want their NA sentinel to
not be comparable with

[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-20 Thread Nathan
On Wed, Sep 20, 2023 at 4:40 AM Kevin Sheppard 
wrote:

>
>
> On Wed, Sep 20, 2023 at 11:23 AM Ralf Gommers 
> wrote:
>
>>
>>
>> On Wed, Sep 20, 2023 at 8:26 AM Warren Weckesser <
>> warren.weckes...@gmail.com> wrote:
>>
>>>
>>>
>>> On Fri, Sep 15, 2023 at 3:18 PM Warren Weckesser <
>>> warren.weckes...@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > On Mon, Sep 11, 2023 at 12:25 PM Nathan 
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
>>> warren.weckes...@gmail.com> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
>>> wrote:
>>> >>> >
>>> >>> > The NEP was merged in draft form, see below.
>>> >>> >
>>> >>> > https://numpy.org/neps/nep-0055-string_dtype.html
>>> >>> >
>>> >>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
>>> wrote:
>>> >>> >>
>>> >>> >> Hello all,
>>> >>> >>
>>> >>> >> I just opened a pull request to add NEP 55, see
>>> https://github.com/numpy/numpy/pull/24483.
>>> >>> >>
>>> >>> >> Per NEP 0, I've copied everything up to the "detailed
>>> description" section below.
>>> >>> >>
>>> >>> >> I'm looking forward to your feedback on this.
>>> >>> >>
>>> >>> >> -Nathan Goldbaum
>>> >>> >>
>>> >>>
>>> >>> This will be a nice addition to NumPy, and matches a suggestion by
>>> >>> @rkern (and probably others) made in the 2017 mailing list thread;
>>> >>> see the last bullet of
>>> >>>
>>> >>>
>>> https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
>>> >>>
>>> >>> So +1 for the enhancement!
>>> >>>
>>> >>> Now for some nitty-gritty review...
>>> >>
>>> >>
>>> >> Thanks for the nitty-gritty review! I was on vacation last week and
>>> haven't had a chance to look over this in detail yet, but at first glance
>>> this seems like a really nice improvement.
>>> >>
>>> >> I'm going to try to integrate your proposed design into the dtype
>>> prototype this week. If that works, I'd like to include some of the text
>>> from the README in your repo in the NEP and add you as an author, would
>>> that be alright?
>>> >
>>> >
>>> >
>>> > Sure, that would be fine.
>>> >
>>> > I have a few more comments and questions about the NEP that I'll
>>> finish up and send this weekend.
>>> >
>>>
>>> One more comment on the NEP...
>>>
>>> My first impression of the missing data API design is that
>>> it is more complicated than necessary. An alternative that
>>> is simpler--and is consistent with the pattern established for
>>> floats and datetimes--is to define a "not a string" value, say
>>> `np.nastring` or something similar, just like we have `nan` for
>>> floats and `nat` for datetimes. Its behavior could be what
>>> you called "nan-like".
>>>
>>
>> Float `np.nan` and datetime missing value sentinel are not all that
>> similar, and the latter was always a bit questionable (at least partially
>> it's a left-over of trying to introduce generic missing value support I
>> believe). `nan` is a float and part of C/C++ standards with well-defined
>> numerical behavior. In contrast, there is no `np.nat`; you can retrieve a
>> sentinel value with `np.datetime64("NaT")` only. I'm not sure if it's
>> possible to generate a NaT value with a regular operation on a datetime
>> array a la `np.array([1.5]) / 0.0`.
>>
>> The handling of `np.nastring` would be an intrinsic part of the
>>> dtype, so there would be no need for the `na_object` parameter
>>> of `StringDType`. All `StringDType`s would handle `np.nastring`
>>> in the same consistent manner.
>>>
>>> The use-case for the string sentinel does not seem very
>>> compelling (but maybe I just d

[Numpy-discussion] Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-06 Thread Nathan
Hi all,

As part of the ongoing work on NEP 52 we are getting close to merging the
pull request that changes numpy.core to numpy._core.

While working on this we realized that numpy pickle files include paths to
np.core in the pickle data. If we do nothing, switching np.core to np._core
will generate deprecation warnings when loading pickle files generated by
Numpy 1.x in Numpy 2.x and Numpy 1.x will be unable to read Numpy 2.x
pickle files. Eventually, when Numpy 2.x completely removes the stub
np.core module, loading old pickle files will break.

The fix we have come up with is to add a new public NumpyUnpickler class to
both the main branch and the Numpy 1.26 maintenance branch. This allows
loading pickle files that were generated by Numpy 1.x and 2.x in either
version without any warnings or errors. Users who are loading old pickle
files will need to update their code to use NumpyUnpickler or create new
pickle files and users who generate pickles with numpy 2.x will need to use
NumpyUnpickler to read the files in numpy 1.x.

We are using NumpyUnpickler internally for loading files in the npy file
format. Users with pickle data saved in npy files won't see warnings. Only
users who are storing data in pickle files directly and who want pickle
files written in one numpy version to load correctly in another numpy
version will run into trouble. The I/O docs already explicitly discourage
using pickles to share data files between people and organizations like
this.

An alternate approach which would require less work for users would be to
leave a limited subset of functionality in `np.core` needed for loading
pickle files undeprecated. We would prefer to avoid doing this both because
it would leave behind a publicly visible `np.core` module in NumPy's public
API and because we're not sure if we can come up with a complete set of
imports that should be allowed without warning from `np.core` without
missing some corner cases and users will see deprecation warnings when
loading pickles anyway.

See https://github.com/numpy/numpy/pull/24866,
https://github.com/numpy/numpy/issues/24844, and the discussion in
https://github.com/numpy/numpy/pull/24634 for more context.

-Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-08 Thread Nathan
I don’t think this will be a problem for using pickle for IPC.

For the python multiprocessing module, all processes would be running the
same numpy version, so there wouldn’t be a problem.

It could be an issue if pickle is used to communicate numpy arrays between
a subset of workers running numpy 1.x and a subset running numpy 2.x in a
distributed workflow that uses pickles for IPC. Even then, it would be a
straightforward code fix, since in that case the code running the
distributed computation would be using pickles directly, not using
multiprocessing.

On Sun, Oct 8, 2023 at 12:51 PM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> If needed I can try to construct a minimal example for testing purposes.
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-09 Thread Nathan
On Mon, Oct 9, 2023 at 12:57 AM Aaron Meurer  wrote:

> Is it possible to convert a NumPy 1 pickle file into a generic pickle
> file that works in both NumPy 1 and 2? As far as I understand, pickle
> is Turing complete, so I imagine it should be theoretically possible,
> but I don't know how easy it would be to actually do this or how it
> would affect the pickle file size.
>

Hi Aaron,

The issue is that the pickle protocol needs a reference to a reconstructor
to recreate numpy types. For ndarray, that function is currently
`numpy.core.multiarray._reconstruct` and in numpy 2 becomes
numpy._core.multiarray.reconstruct. For a pickle file containing only an
ndarray, this is the first thing in the pickle file and the import happens
inside of the pickle implementation. I am not aware of a hook that Python
gives us to intercept that path before Python imports it.

So, even if there is a way to correct subsequent paths in the pickle file,
we won't be able to fix the most problematic path that will occur in any
pickle that includes a numpy array. That means some user-visible pain no
matter what. If we can't avoid that, I'd prefer to offer a solution that
will allow people to continue loading old pickle files indefinitely (albeit
with a minor code change). This also gives us a place to put compatibility
fixes for future changes that impact old pickle files.

-Nathan



>
> Aaron Meurer
>
> On Fri, Oct 6, 2023 at 10:17 AM Nathan  wrote:
> >
> > Hi all,
> >
> > As part of the ongoing work on NEP 52 we are getting close to merging
> the pull request that changes numpy.core to numpy._core.
> >
> > While working on this we realized that numpy pickle files include paths
> to np.core in the pickle data. If we do nothing, switching np.core to
> np._core will generate deprecation warnings when loading pickle files
> generated by Numpy 1.x in Numpy 2.x and Numpy 1.x will be unable to read
> Numpy 2.x pickle files. Eventually, when Numpy 2.x completely removes the
> stub np.core module, loading old pickle files will break.
> >
> > The fix we have come up with is to add a new public NumpyUnpickler class
> to both the main branch and the Numpy 1.26 maintenance branch. This allows
> loading pickle files that were generated by Numpy 1.x and 2.x in either
> version without any warnings or errors. Users who are loading old pickle
> files will need to update their code to use NumpyUnpickler or create new
> pickle files and users who generate pickles with numpy 2.x will need to use
> NumpyUnpickler to read the files in numpy 1.x.
> >
> > We are using NumpyUnpickler internally for loading files in the npy file
> format. Users with pickle data saved in npy files won't see warnings. Only
> users who are storing data in pickle files directly and who want pickle
> files written in one numpy version to load correctly in another numpy
> version will run into trouble. The I/O docs already explicitly discourage
> using pickles to share data files between people and organizations like
> this.
> >
> > An alternate approach which would require less work for users would be
> to leave a limited subset of functionality in `np.core` needed for loading
> pickle files undeprecated. We would prefer to avoid doing this both because
> it would leave behind a publicly visible `np.core` module in NumPy's public
> API and because we're not sure if we can come up with a complete set of
> imports that should be allowed without warning from `np.core` without
> missing some corner cases and users will see deprecation warnings when
> loading pickles anyway.
> >
> > See https://github.com/numpy/numpy/pull/24866,
> https://github.com/numpy/numpy/issues/24844, and the discussion in
> https://github.com/numpy/numpy/pull/24634 for more context.
> >
> > -Nathan
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: asmeu...@gmail.com
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-09 Thread Nathan
On Mon, Oct 9, 2023 at 2:44 PM Oscar Benjamin 
wrote:

> On Mon, 9 Oct 2023 at 17:03, Nathan  wrote:
> >
> > On Mon, Oct 9, 2023 at 12:57 AM Aaron Meurer  wrote:
> >>
> >> Is it possible to convert a NumPy 1 pickle file into a generic pickle
> >> file that works in both NumPy 1 and 2? As far as I understand, pickle
> >> is Turing complete, so I imagine it should be theoretically possible,
> >> but I don't know how easy it would be to actually do this or how it
> >> would affect the pickle file size.
>
> There are many ways that this could be made to work with the various
> options like reduce() etc.
>
> > The issue is that the pickle protocol needs a reference to a
> reconstructor to recreate numpy types. For ndarray, that function is
> currently `numpy.core.multiarray._reconstruct` and in numpy 2 becomes
> numpy._core.multiarray.reconstruct. For a pickle file containing only an
> ndarray, this is the first thing in the pickle file and the import happens
> inside of the pickle implementation. I am not aware of a hook that Python
> gives us to intercept that path before Python imports it.
> >
> > So, even if there is a way to correct subsequent paths in the pickle
> file, we won't be able to fix the most problematic path that will occur in
> any pickle that includes a numpy array. That means some user-visible pain
> no matter what. If we can't avoid that, I'd prefer to offer a solution that
> will allow people to continue loading old pickle files indefinitely (albeit
> with a minor code change). This also gives us a place to put compatibility
> fixes for future changes that impact old pickle files.
>
>
Hi Oscar,


> Suppose that there is NumPy v1 and that in future there will be NumPy
> v2. Also suppose that there will be two NumPy pickle formats fmtA and
> a future fmtB. One possibility is that NumPy v1 only reads and writes
> fmtA and then NumPy v2 only reads and writes fmtB. One problem with
> this is that when NumPy v2 comes out there is no easy way to convert
> pickles from fmtA to fmtB for compatibility with NumPy v2. Another
> problem with this is that it does not make a nice transition during
> any period of time when both NumPy v1 and v2 might be used in
> different parts of a software stack.
>

Doesn't NumpyUnpickler solve this? It will be present in both v1 and v2 and
will allow loading files either np.core or np._core in either version.


> An alternative is to introduce fmtB as part of the NumPy v1 series.
> NumPy could be changed now so that it can read both fmtA and fmtB but
> by default it would write fmtB which would be designed ahead of time
> so that in future NumPy v2 would be able to read fmtB as well. It
> would also be possible to design it so that fmtB would be readable by
> older versions of NumPy that were released before fmtB was designed.
>
> Then there is a version of NumPy (v1) which can read fmtA and write to
> fmtB. This version of NumPy can be used to convert pickles from fmtA
> to fmtB. Then when NumPy v2 is released it can already read any
> pickles that were generated by the most recent releases of NumPy v1.x.
> Anyone who still has older pickles in fmtA could use NumPy v1 to do
> dumps(loads(f)) which would convert from fmtA to fmtB.
>
> In this scenario the only part that does not work is reading fmtA in
> NumPy v2 which is unavoidable if numpy.core is removed or renamed in
> v2.
>

I agree it would have been better to anticipate this and move the
_reconstruct function to np._core many releases ago. Sadly this was not
done and the next release is Numpy 2.0.

I also want to emphasize that using pickle like this - to share data
between different python installations - is inherently insecure and should
never be done outside of an organization that fully controls all of the
python installations. In that case, the organization can use
NumpyUnpickler. In any other case, I think it's good to perhaps nudge
people away from doing things like this.


>
> --
> Oscar
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-09 Thread Nathan
On Mon, Oct 9, 2023 at 3:12 PM Oscar Benjamin 
wrote:

> On Mon, 9 Oct 2023 at 21:57, Nathan  wrote:
> >
> > On Mon, Oct 9, 2023 at 2:44 PM Oscar Benjamin <
> oscar.j.benja...@gmail.com> wrote:
> >> Suppose that there is NumPy v1 and that in future there will be NumPy
> >> v2. Also suppose that there will be two NumPy pickle formats fmtA and
> >> a future fmtB. One possibility is that NumPy v1 only reads and writes
> >> fmtA and then NumPy v2 only reads and writes fmtB. One problem with
> >> this is that when NumPy v2 comes out there is no easy way to convert
> >> pickles from fmtA to fmtB for compatibility with NumPy v2. Another
> >> problem with this is that it does not make a nice transition during
> >> any period of time when both NumPy v1 and v2 might be used in
> >> different parts of a software stack.
> >
> > Doesn't NumpyUnpickler solve this? It will be present in both v1 and v2
> and will allow loading files either np.core or np._core in either version.
>
> I guess that makes it possible in some way to convert formats in
> either version. I presume though that this still means that a plain
> pickle.loads() (and any code built on top of such) would fail in v2.
>

In Numpy2.0 you would see a deprecation warning about the path in the
pickle file but no crash. Eventually, when we finally remove the stub
np.core, you would see a crash.

However, one thing we can do now is, for that one particular symbol that we
know is going to be in every pickle file and probably never elsewhere, is
intercept that one import and instead of generating a generic warning about
np.core being deprecated, we instead make that specific version of the
deprecation warning mentions NumpyUnpickler. I'll make sure this gets done.

We *could* just allow that import to happen without a warning, but then
we're stuck keeping np.core around even longer and we also will still
generate a deprecation warning for an import from np.core if the pickle
file happens to include any other numpy types that might generate imports
in np.core.


>
> >> An alternative is to introduce fmtB as part of the NumPy v1 series.
> >> NumPy could be changed now so that it can read both fmtA and fmtB but
> >> by default it would write fmtB which would be designed ahead of time
> >> so that in future NumPy v2 would be able to read fmtB as well. It
> >> would also be possible to design it so that fmtB would be readable by
> >> older versions of NumPy that were released before fmtB was designed.
> >>
> >> Then there is a version of NumPy (v1) which can read fmtA and write to
> >> fmtB. This version of NumPy can be used to convert pickles from fmtA
> >> to fmtB. Then when NumPy v2 is released it can already read any
> >> pickles that were generated by the most recent releases of NumPy v1.x.
> >> Anyone who still has older pickles in fmtA could use NumPy v1 to do
> >> dumps(loads(f)) which would convert from fmtA to fmtB.
> >>
> >> In this scenario the only part that does not work is reading fmtA in
> >> NumPy v2 which is unavoidable if numpy.core is removed or renamed in
> >> v2.
> >
> > I agree it would have been better to anticipate this and move the
> _reconstruct function to np._core many releases ago. Sadly this was not
> done and the next release is Numpy 2.0.
>
> Well if the next release is NumPy 2.0 then my suggestion does not
> work. There are alternatives but they might not be worth it at this
> point.
>
> > I also want to emphasize that using pickle like this - to share data
> between different python installations - is inherently insecure and should
> never be done outside of an organization that fully controls all of the
> python installations. In that case, the organization can use
> NumpyUnpickler. In any other case, I think it's good to perhaps nudge
> people away from doing things like this.
>
> Agreed but I guarantee that someone depends on this and is using it in
> a way that is reasonable for their own purposes. There might not be
> much to be done about it but someone will experience unexpected
> breakage and it is worthwhile to contemplate (as you are doing) what
> can be done to mitigate that.
>
> --
> Oscar
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-09 Thread Nathan
On Mon, Oct 9, 2023 at 3:58 PM Oscar Benjamin 
wrote:

> On Mon, 9 Oct 2023 at 22:30, Nathan  wrote:
> >
> > On Mon, Oct 9, 2023 at 3:12 PM Oscar Benjamin <
> oscar.j.benja...@gmail.com> wrote:
> >>
> >> On Mon, 9 Oct 2023 at 21:57, Nathan  wrote:
> >> >
> >> > On Mon, Oct 9, 2023 at 2:44 PM Oscar Benjamin <
> oscar.j.benja...@gmail.com> wrote:
> >> >> Suppose that there is NumPy v1 and that in future there will be NumPy
> >> >> v2. Also suppose that there will be two NumPy pickle formats fmtA and
> >> >> a future fmtB. One possibility is that NumPy v1 only reads and writes
> >> >> fmtA and then NumPy v2 only reads and writes fmtB. One problem with
> >> >> this is that when NumPy v2 comes out there is no easy way to convert
> >> >> pickles from fmtA to fmtB for compatibility with NumPy v2. Another
> >> >> problem with this is that it does not make a nice transition during
> >> >> any period of time when both NumPy v1 and v2 might be used in
> >> >> different parts of a software stack.
> >> >
> >> > Doesn't NumpyUnpickler solve this? It will be present in both v1 and
> v2 and will allow loading files either np.core or np._core in either
> version.
> >>
> >> I guess that makes it possible in some way to convert formats in
> >> either version. I presume though that this still means that a plain
> >> pickle.loads() (and any code built on top of such) would fail in v2.
> >
> > In Numpy2.0 you would see a deprecation warning about the path in the
> pickle file but no crash. Eventually, when we finally remove the stub
> np.core, you would see a crash.
>
> Okay, that makes sense. What happens in the reverse scenario: loading
> a pickle generated by NumPy 2.0 using NumPy 1.x?


There would be a crash, so people creating these pickles would need to tell
users to load them using NumpyUnpickler. Do you see that being problematic?
It would only impact newly created pickle files and there would be an
immediate fix available - use NumpyUnpickler.load instead of pickle.load.


>
> --
> Oscar
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-10 Thread Nathan
On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> I have one more useCase to consider from our ecosystem.
>
> We dump numpy arrays into a MongoDB using GridFS for subsequent
> visualization, some snippets:
>
> '''Python
> with BytesIO() as BIO:
> np.save(BIO, numpy_array)
> serialized_A = BIO.getvalue()
> filehandle_id = self.representations_files.put(serialized_A)
> '''
>
> and then restore them in the other application:
>
> '''Python
> numpy_array = np.load(BytesIO(serializedA))
> '''
> For us this is for development work only and I am less concerned about
> having mixed versions in my database, but in principle that is a scenario.
> But it seems to me that for this to work the reading application needs to
> be migrated to version 2 and temporarily extended with the NumpyUnpickler
> before the writing application is migrated. Or they need to be migrated at
> the same time. Is that correct?


np.save and np.load will use NumpyUnpickler under the hood so you won’t
have any issues, you would only have issues if you saved or loaded pickles
using the pickle module directly.



> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-11 Thread Nathan
On Wed, Oct 11, 2023 at 4:24 PM Mateusz Sokol  wrote:

> Hi! Thank you for all your feedback this week!
>
> We have made a decision to take a less disruptive option that we
> considered and that came up in this discussion.
>
> We back out of the `NumpyUnpickler` class solution for reading pickles
> across major NumPy versions.
>
> Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading
> NumPy 1.x pickles.
> Additionally, `numpy._core` stubs will be backported to 1.26 to ensure
> compatibility the other way around - loading NumPy 2.0 pickles with NumPy
> 1.26 installed.
>
> Both major versions will continue to create pickles with their own
> contents (NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with
> `numpy._core` paths).
>
> This way any pickle will be loadable by both major versions.
>

Thanks for the summary Mateusz!

I want to add that there will still be module-level `__getattr__`
implementations that will raise deprecation warnings on any attribute
access in `np.core`, `numpy.core.multiarray` or
`numpy.core._multiarray_umath`, but direct imports will not generate any
warnings. Since pickles directly import types that appear in pickle files,
loading a pickle that refers to types or functions in these modules won’t
generate any warnings.

Searching on github indicates that direct imports like this are relatively
rare in user code, which tend to either just import the top-level numpy
module and use attribute access or use `from` imports, which both invoke
the module-level `__getattr__`. Hopefully we’ll get most of the benefit of
alerting users that they are using private internals without needing to
break old pickles.


>
>
> On Tue, Oct 10, 2023 at 3:33 PM Nathan  wrote:
>
>>
>>
>> On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg <
>> r.a.j.van.elb...@hetnet.nl> wrote:
>>
>>> I have one more useCase to consider from our ecosystem.
>>>
>>> We dump numpy arrays into a MongoDB using GridFS for subsequent
>>> visualization, some snippets:
>>>
>>> '''Python
>>> with BytesIO() as BIO:
>>> np.save(BIO, numpy_array)
>>> serialized_A = BIO.getvalue()
>>> filehandle_id = self.representations_files.put(serialized_A)
>>> '''
>>>
>>> and then restore them in the other application:
>>>
>>> '''Python
>>> numpy_array = np.load(BytesIO(serializedA))
>>> '''
>>> For us this is for development work only and I am less concerned about
>>> having mixed versions in my database, but in principle that is a scenario.
>>> But it seems to me that for this to work the reading application needs to
>>> be migrated to version 2 and temporarily extended with the NumpyUnpickler
>>> before the writing application is migrated. Or they need to be migrated at
>>> the same time. Is that correct?
>>
>>
>> np.save and np.load will use NumpyUnpickler under the hood so you won’t
>> have any issues, you would only have issues if you saved or loaded pickles
>> using the pickle module directly.
>>
>>
>>
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: nathan12...@gmail.com
>>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>
> Member address: mso...@quansight.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] NEP 55 Updates and call for testing

2023-11-22 Thread Nathan
Hi all,

This week I updated NEP 55 to reflect the changes I made to the prototype
since
I initially sent out the NEP. The updated NEP is available on the NumPy
website:
https://numpy.org/neps/nep-0055-string_dtype.html.

Updates to the NEP
++

The changes since the original version of the NEP focus on fully defining
the C
API surface we would like to add to the NumPy C API and an implementation
of a
per-dtype-instance arena allocator to manage heap allocations. This enabled
major improvements to the prototype, including implementing the small string
optimization and locking all access to heap memory behind a fine-grained
mutex
which should prevent seg faults or memory corruption in a multithreaded
context. Thanks to Warren Weckesser for his proof of concept code and help
with
the small string optimization implementation, he has been added as an
author to
reflect his contributions.

With these changes the stringdtype prototype is feature complete.

Call to Review NEP 55
+

I'm requesting another round of review on the NEP with an eye toward
acceptance
before the NumPy 2.0 release branch is created from main. If I can manage
it, my
plan is to have a pull request open that merges the stringdtype codebase
into
NumPy before the branch is created. That said, if we decide that we need
more
time, or if some issue comes up, I'm happy with this going into main after
the
NumPy 2.0 release branch is created.

The most significant feedback we have not addressed from the last round of
review was Warren's suggestion to add a default missing data sentinel to
NumPy
itself. For reasons outlined in the NEP and in my reply to Warren from
earlier
this year, we do not want to add a missing data singleton to NumPy, instead
leaving it to users to choose the missing data semantics they prefer.
Otherwise I
believe the current draft addresses all outstanding feedback from the last
round of review.

Help me Test the Prototype!
+++

If anyone has time and interest, I would also very much appreciate some
testing
and tire-kicking on the stringdtype prototype, available at
https://github.com/numpy/numpy-user-dtypes.

There is a README with build instructions here:
https://github.com/numpy/numpy-user-dtypes/blob/main/stringdtype/README.md

If you have a Python development environment with a C compiler, it should be
straightforward to build, install, and test the prototype. Note that you
must
have `NUMPY_EXPERIMENTAL_DTYPE_API=1` set in your shell environment or via
`os.environ` to import stringdtype without error.

I'm particularly interested to hear experiences converting code to use
stringdtype. This could be code using fixed-width strings in a situation
where a
variable-length string array makes more sense or code using object string
arrays. Are there pain points that aren't discussed in the NEP or existing
workflows that cannot be adapted to use stringdtype? As far as I'm aware
there
aren't, but more testing will help catch issues before we've stabilized
everything.

My fork of pandas might be a source of inspiration for porting an existing
non-trivial
codebase that used object string arrays:

https://github.com/pandas-dev/pandas/compare/main...ngoldbaum:pandas:stringdtype

Thanks all for your time, attention, and help reviewing the NEP!

-Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env

2023-11-26 Thread Nathan
I want to caution about using `pip install -e .` to get a development
install of numpy. This will work fine working on numpy itself, but won’t be
useful if you need to use the development version of numpy to build another
library. This doesn’t work because in-place installs don’t install the
numpy headers (arguably it was a bug that the old setuptools install did)
into the git repo, so the include paths `np.get_include()` reports won’t be
correct.

See this meson-python issue:
https://github.com/mesonbuild/meson-python/issues/429

For my work I tend to use a persistent build directory with build isolation
disabled as discussed in the meson-python docs. This gives me fast rebuilds
without using an in-place build. It does mean there’s a build and install
step when you edit python code in numpy that would otherwise be unnecessary
and sometimes the cache can go stale for reasons that aren’t totally
obvious.

In principle numpy could fix this by ensuring the headers get generated in
the git repo in the place they’re supposed to be installed. I have no idea
how hard it would be beyond that it would definitely require messing with
the codegen scripts.

On Sun, Nov 26, 2023 at 10:53 AM Stefan van der Walt via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Hi Doug,
>
> On Sun, Nov 26, 2023, at 06:29, Doug Turnbull wrote:
>
> To debug, I ran `pip install . --no-build-isolation` it worked (using
> venv's numpy)
>
>
> When developing NumPy, we typically build in the existing environment.
> This is done either via `pip install -e .` (which installs hooks to trigger
> a re-compile upon import), or via the spin tool (
> https://github.com/scientific-python/spin), which have meson commands
> pre-bundled:
>
> pip install spin
> spin  # lists commands available
>
> Best regards,
> Stéfan
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Meson - C extension - Finding numpy includes in virtual env

2023-11-28 Thread Nathan
It looks like `spin build` does `meson build` and `meson install` and
doesn't do `pip install`. I'd like numpy to be importable in a python
environment of my choosing, so I tend to instead manually install numpy
into that environment by invoking pip with something like `python -m pip
install . -v --no-build-isolation -Cbuilddir=build -C'compile_args=-v'
-C'setup_args=-Dbuildtype=debug'. I like seeing the compile command meson
uses, so I pass in `-v` through meson's `compile_args` and I often need a
debug build, so I set the build type manually as well.

I could probably get the same effect by either manually activating the spin
python environment (not sure how to do that) or using `spin run` somehow
outside of the numpy tree, but what I have seems to work OK for me now so I
haven't tried to mess with spin more.

On Sun, Nov 26, 2023 at 1:57 PM Stefan van der Walt 
wrote:

> On Sun, Nov 26, 2023, at 12:03, Nathan wrote:
>
> For my work I tend to use a persistent build directory with build
> isolation disabled as discussed in the meson-python docs.
>
>
> Out of curiosity, how is this different from, e.g., `spin build` which
> builds into `./build-install`?
>
> Stéfan
>
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 Updates and call for testing

2023-12-08 Thread Nathan
I just opened a draft PR to include stringdtype in numpy:
https://github.com/numpy/numpy/pull/25347

If you are interested in testing the new dtype but haven't had the chance
yet, hopefully this should be easier to test. From a clone of the NumPy
repo, doing:

$ git fetch https://github.com/ngoldbaum/numpy stringdtype:stringdtype
$ git checkout stringdtype
$ git submodule update --init
$ python -m pip install .

should build and install a version of NumPy that includes stringdtype,
importable as `np.dtypes.StringDType`. Note that this is based on numpy 2.0
dev, so if you need to use another package that depends on NumPy's ABI to
test the dtype, you'll need to rebuild that project as well.

I'll be continuing to work on this PR to finish integrating stringdtype
into NumPy and write documentation.

If anyone has any feedback on any aspect of the NEP or the stringdtype code
please reply here, on github, or reach out to me privately.

On Wed, Nov 22, 2023 at 1:22 PM Nathan  wrote:

> Hi all,
>
> This week I updated NEP 55 to reflect the changes I made to the prototype
> since
> I initially sent out the NEP. The updated NEP is available on the NumPy
> website:
> https://numpy.org/neps/nep-0055-string_dtype.html.
>
> Updates to the NEP
> ++
>
> The changes since the original version of the NEP focus on fully defining
> the C
> API surface we would like to add to the NumPy C API and an implementation
> of a
> per-dtype-instance arena allocator to manage heap allocations. This enabled
> major improvements to the prototype, including implementing the small
> string
> optimization and locking all access to heap memory behind a fine-grained
> mutex
> which should prevent seg faults or memory corruption in a multithreaded
> context. Thanks to Warren Weckesser for his proof of concept code and help
> with
> the small string optimization implementation, he has been added as an
> author to
> reflect his contributions.
>
> With these changes the stringdtype prototype is feature complete.
>
> Call to Review NEP 55
> +
>
> I'm requesting another round of review on the NEP with an eye toward
> acceptance
> before the NumPy 2.0 release branch is created from main. If I can manage
> it, my
> plan is to have a pull request open that merges the stringdtype codebase
> into
> NumPy before the branch is created. That said, if we decide that we need
> more
> time, or if some issue comes up, I'm happy with this going into main after
> the
> NumPy 2.0 release branch is created.
>
> The most significant feedback we have not addressed from the last round of
> review was Warren's suggestion to add a default missing data sentinel to
> NumPy
> itself. For reasons outlined in the NEP and in my reply to Warren from
> earlier
> this year, we do not want to add a missing data singleton to NumPy, instead
> leaving it to users to choose the missing data semantics they prefer.
> Otherwise I
> believe the current draft addresses all outstanding feedback from the last
> round of review.
>
> Help me Test the Prototype!
> +++
>
> If anyone has time and interest, I would also very much appreciate some
> testing
> and tire-kicking on the stringdtype prototype, available at
> https://github.com/numpy/numpy-user-dtypes.
>
> There is a README with build instructions here:
> https://github.com/numpy/numpy-user-dtypes/blob/main/stringdtype/README.md
>
> If you have a Python development environment with a C compiler, it should
> be
> straightforward to build, install, and test the prototype. Note that you
> must
> have `NUMPY_EXPERIMENTAL_DTYPE_API=1` set in your shell environment or via
> `os.environ` to import stringdtype without error.
>
> I'm particularly interested to hear experiences converting code to use
> stringdtype. This could be code using fixed-width strings in a situation
> where a
> variable-length string array makes more sense or code using object string
> arrays. Are there pain points that aren't discussed in the NEP or existing
> workflows that cannot be adapted to use stringdtype? As far as I'm aware
> there
> aren't, but more testing will help catch issues before we've stabilized
> everything.
>
> My fork of pandas might be a source of inspiration for porting an existing
> non-trivial
> codebase that used object string arrays:
>
>
> https://github.com/pandas-dev/pandas/compare/main...ngoldbaum:pandas:stringdtype
>
> Thanks all for your time, attention, and help reviewing the NEP!
>
> -Nathan
>
>
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Proposal to accept NEP 55: Add a UTF-8 variable-width string DType to NumPy

2024-01-22 Thread Nathan
Hi all,

I propose we accept NEP 55 and merge PR #25347 implementing the NEP in time
for the NumPy 2.0 RC:

https://numpy.org/neps/nep-0055-string_dtype.html
https://github.com/numpy/numpy/pull/25347

The most controversial aspect of the NEP was support for missing strings
via a user-supplied sentinel object. In the previous discussion on the
mailing list, Warren Weckesser argued for shipping a missing data sentinel
with NumPy for use with the DType, while in code review and the PR for the
NEP, Sebestian expressed concern about the additional complexity of
including missing data support at all.

I found that supporting missing data is key to efficiently supporting the
new DType in Pandas. I think that argues that we need some level of missing
data support to fully replace object string arrays. I believe the
compromise proposal in the NEP is sufficient for downstream libraries while
limiting additional complexity elsewhere in NumPy.

Concerns raised in previous discussions about concretely specifying the C
API to be made public, preventing use-after-free errors in a multithreaded
context, and uncertainty around the arena allocator implementation have
been resolved in the latest version of the NEP and the open PR.
Additionally, due to some excellent and timely work by Lysandros Nikolaou,
we now have a number of string ufuncs in NumPy and a straightforward plan
to add more. Loops have been implemented for all the ufuncs added in the
NumPy 2.0 dev cycle so far.

I would like to see us ship the DType in NumPy 2.0. This will allow us to
advertise a major new feature, will spur efforts to support new DTypes in
downstream libraries, and will allow us to get feedback from the community
that would be difficult to obtain without releasing the code into the wild.
Additionally, I am funded via a NASA ROSES grant for work related to this
effort until the end of 2024, so including the DType in NumPy 2.0 will more
efficiently use my funded time to fix issues.

If there are no substantive objections to this email, then the NEP will be
considered accepted; see NEP 0 for more details:
https://numpy.org/neps/nep-.html
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2024-02-12 Thread Nathan
On Mon, Feb 12, 2024 at 1:47 PM Jim Pivarski  wrote:

> Hi,
>
> I know that I'm a little late to be asking about this, but I don't see a
> comment elsewhere on it (in the NEP, the implementation PR #25347, or this
> email thread).
>
> As I understand it, the new StringDType implementation distinguishes 3
> types of individual strings, any of which can be present in an array:
>
>1. short strings, included inline in the array (at most 15 bytes on a
>64-bit system)
>2. arena-allocated strings, which are managed by the
>npy_string_allocator
>3. heap-allocated strings, which are pointers anywhere in RAM.
>
> Does case 3 include strings that are passed to the array as views, without
> copying? If so, then the ownership of strings would either need to be
> tracked on a per-string basis (distinct from the array_owned boolean,
> which characterizes the whole array), or they need to all be considered
> stolen references (NumPy will free all of them when the array goes out of
> scope), or they all need to be considered borrowed references (NumPy will
> not free any of them when the array goes out of scope).
>

Stringdtyoe arrays don’t intern python strings directly, there’s always a
copy. Array views are allowed, but I don’t think that’s what you’re talking
about. The mutex guarding access to the string data prevents arrays from
being garbage collected while a C thread holds a pointer to the string
data, at least assuming correct usage of the C API that doesn’t try to use
a string after releasing the allocator.


> If the array does not accept new strings as views, but always copies any
> externally provided string, then why distinguish between cases 2 and 3? How
> would an array end up with some strings being arena-allocated and other
> strings being heap-allocated?
>

You can only get a heap string entry in an array if you enlarge an entry in
the array. The goal with allowing heap strings like this was to have an
escape hatch that allows enlarging a single array entry without adding
complexity or needing to re-allocate the entire arena buffer.

For example, if you create an array with a short string entry and then edit
that entry to be longer than 15 bytes. Rather than appending to the arena
or re-allocating it, we convert the entry to a heap string.


> Thanks!
> -- Jim
>
>
>
>
> On Wed, Sep 20, 2023 at 10:25 AM Nathan  wrote:
>
>>
>>
>> On Wed, Sep 20, 2023 at 4:40 AM Kevin Sheppard <
>> kevin.k.shepp...@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Sep 20, 2023 at 11:23 AM Ralf Gommers 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Sep 20, 2023 at 8:26 AM Warren Weckesser <
>>>> warren.weckes...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 15, 2023 at 3:18 PM Warren Weckesser <
>>>>> warren.weckes...@gmail.com> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Sep 11, 2023 at 12:25 PM Nathan 
>>>>> wrote:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
>>>>> warren.weckes...@gmail.com> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
>>>>> wrote:
>>>>> >>> >
>>>>> >>> > The NEP was merged in draft form, see below.
>>>>> >>> >
>>>>> >>> > https://numpy.org/neps/nep-0055-string_dtype.html
>>>>> >>> >
>>>>> >>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan <
>>>>> nathan.goldb...@gmail.com> wrote:
>>>>> >>> >>
>>>>> >>> >> Hello all,
>>>>> >>> >>
>>>>> >>> >> I just opened a pull request to add NEP 55, see
>>>>> https://github.com/numpy/numpy/pull/24483.
>>>>> >>> >>
>>>>> >>> >> Per NEP 0, I've copied everything up to the "detailed
>>>>> description" section below.
>>>>> >>> >>
>>>>> >>> >> I'm looking forward to your feedback on this.
>>>>> >>> >>
>>>>> >>> >> -Nathan Goldbaum
>>>>> >>> >>
>>>>> >>>
>>>>> >>&

[Numpy-discussion] New DType and ArrayMethod C APIs are public

2024-02-14 Thread Nathan
Hi all,

Today we merged the PR that publicly exposed the formerly "experimental"
DType API and  ArrayMethod API. See
https://github.com/numpy/numpy/pull/25754.

The docs for the new C API are here:

https://numpy.org/devdocs/reference/c-api/array.html#arraymethod-api
https://numpy.org/devdocs/reference/c-api/types-and-structures.html#arraymethod-structs
https://numpy.org/devdocs/reference/c-api/array.html#custom-data-types
https://numpy.org/devdocs/reference/c-api/types-and-structures.html#dtypemeta

The DType API publicly exposes the PyArray_DTypeMeta C struct, which
represents DType metaclasses. It also exposes a function for registering
user-defined DTypes and a set of slot IDs and function typedefs that users
can implement in C to write new DTypes.

The ArrayMethod API allows defining cast and ufunc loops in terms of these
new DTypes, in a manner that forbids value-based promotion and abstracts
many of the internals of NumPy. We hope the ArrayMethod API is enables
sharing low-level loops that work out-of-the-box in NumPy in other projects.

I used the DType API to write the new StringDType that was recently added
to numpy. One of the goals of this API is both to make it easier for the
community to write new DTypes but also for people to experiment with DTypes
outside of NumPy, prove community need and viability, and then upstream
them into NumPy as a self-contained artifact without a need for a deep
knowledge of numpy internals.

This is still a C API and it does require knowledge of the CPython and
NumPy C APIs and the correct way to use them, but it is now substantially
easier than it was before to write new DTypes.

It is our goal that new DTypes will be generally compatible with downstream
users of NumPy. If your project uses the NumPy python API, then it's likely
this is already the case, although there may still be some wrinkles if you
do introspection of DTypes. If you use the NumPy C API, and in particular,
if you use type numbers to specify numpy DTypes, it's likely that your C
code will need some updating to work properly with new DTypes.

All that to say, the 2.0 release still isn't final and we'd love feedback
on any part of this. This is all new API surface so we have a unique chance
to fix mistakes before the API is fully public in the final numpy 2.0
release. Things are still in a little flux - we still need to update the
example user DType implementations in the numpy-user-dtypes to use the
final public API - but now is probably as good a time as ever to start
writing a new DType if you've ever been interested. We have a #user-dtypes
channel on the numpy community slack if you're interested in chatting about
this in a low-latency context - contact me off-list if you want an invite.

This caps off several years of work from many people, including everyone
involved in Nep 40-44, which describe this new API. In particular I'd like
to highlight Sebastian Berg, who led this whole effort, Matti Picus, Stéfan
van der Walt, Ben Nathanson, Marten van Kerkwijk, who co-authored the NEPs,
Ralf Gommers who helped get funding for my work and provided mentoring and
coordination, Charles Harris for providing leadership, context, and advice,
and many others who have contributed in big and small ways.

-Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Resources of newcomers and some questions about my first PR

2024-02-20 Thread Nathan
On Mon, Feb 19, 2024 at 3:20 PM Amogh Sood  wrote:

> Hi all.
>
> I want to thank you all for your contributions and efforts in maintaining
> numpy.
> numpy has been an indispensable tool during both my graduate and
> undergraduate studies.
>
> While I am no software wizard, going forward, I do want to make efforts to
> improve my skills and and also contribute to
> projects that I use regularly and benefit from (in my spare time).
>
> PR Specific Questions:
>
> The docs linked here https://numpy.org/doc/stable/dev/ have been helpful
> in learning how to contribute.
> I have also taken some time to review open issues and attempt my first PR.
> Feedback and criticism are more than welcome.
> I tried following the outlined practices to the best ability.
> https://github.com/numpy/numpy/pull/25844
> After submitting the PR, I am looking to remedy the following:
> (i) Github actions `Some checks were not successful 1 cancelled, 3
> skipped, 59 successful, and 1 failing checks` specifically
> `BLAS tests (Linux) / Test Linux (nightly OpenBLAS) (pull_request)` failed.


There are issues with the nightly openblas wheels at the moment, you can
ignore these failures.


> (ii) I see an unverified tag on the commits associated with my PR ? I
> couldn't find the process to remedy this in the docs.


You can ignore this too. If you really want the “verified” bubble you’ll
need to turn on commit signing but that’s not necessary for numpy (and IMO
is mostly security theater).


>
> General Questions
> a) What other resources are available for newcomers  (if any) ?
> Specifically with respect to mentorship, code review, feedback channels ?


You can join the developer slack chat, see
https://numpy.org/contribute/ for details on how to join.


> b) Are there lists of beginner friend BUG/ENH that are maintained
> somewhere ? (e.g. a good first issue tag?)


There is a “sprintable” tag. Numpy isn’t really a beginner-friendly library
so we don’t want to deceive people by labeling a bug as easy but for an
inexperienced developer would be very hard.


>
> I look forward to learning from and collaborating with this community.
> Many thanks for your time and efforts.
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Add 'axis' kwarg to np.tile

2024-02-29 Thread Nathan
Hi,

I think the thing to do is argue that this should be included in the array
API:

https://data-apis.org/array-api/latest/API_specification/generated/array_api.tile.html#array_api.tile

Once that’s settled we can add it to NumPy.

In general there’s a feeling that there are already too many keywords in
the API and now that the array API is a thing, we can point to that as a
place to hash out API decisions.

Including syntax in the array API also encourages more libraries to adopt
your preferred syntax.

Nathan

On Thu, Feb 29, 2024 at 4:12 PM  wrote:

> Hoping to get some more feedback on my recent PR [0] which has stagnated a
> bit for the past few weeks.
>
> This adds an `axis` keyword argument to np.tile which may be an int or
> tuple of ints, much like np.sum or np.roll.
>
> This is my first contribution to numpy :)
>
> Thanks,
> Evan
>
> [0]: https://github.com/numpy/numpy/pull/25703
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Arrays of variable itemsize

2024-03-13 Thread Nathan
It is possible to do this using the new DType system.

Sebastian wrote a sketch for a DType backed by the GNU multiprecision float
library:
https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype

It adds a significant amount of complexity to store data outside the array
buffer and introduces the possibility of use-after-free and dangling
reference errors that are impossible if the array does not use embedded
references, so that’s the main reason it hasn’t been done much.

On Wed, Mar 13, 2024 at 8:17 AM Dom Grigonis  wrote:

> Hi all,
>
> Say python’s builtin `int` type. It can be as large as memory allows.
>
> np.ndarray on the other hand is optimized for vectorization via strides,
> memory structure and many things that I probably don’t know. Well the point
> is that it is convenient and efficient to use for many things in comparison
> to python’s built-in list of integers.
>
> So, I am thinking whether something in between exists? (And obviously
> something more clever than np.array(dtype=object))
>
> Probably something similar to `StringDType`, but for integers and floats.
> (It’s just my guess. I don’t know anything about `StringDType`, but just
> guessing it must be better than np.array(dtype=object) in combination
> with np.vectorize)
>
> Regards,
> dgpb
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Arrays of variable itemsize

2024-03-13 Thread Nathan
Yes, an array of references still has a fixed size width in the array
buffer. You can think of each entry in the array as a pointer to some other
memory on the heap, which can be a dynamic memory allocation.

There's no way in NumPy to support variable-sized array elements in the
array buffer, since that assumption is key to how numpy implements strided
ufuncs and broadcasting.,

On Wed, Mar 13, 2024 at 9:34 AM Dom Grigonis  wrote:

> Thank you for this.
>
> I am just starting to think about these things, so I appreciate your
> patience.
>
> But isn’t it still true that all elements of an array are still of the
> same size in memory?
>
> I am thinking along the lines of per-element dynamic memory management.
> Such that if I had array [0, 1e1], the first element would default to
> reasonably small size in memory.
>
> On 13 Mar 2024, at 16:29, Nathan  wrote:
>
> It is possible to do this using the new DType system.
>
> Sebastian wrote a sketch for a DType backed by the GNU multiprecision
> float library:
> https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype
>
> It adds a significant amount of complexity to store data outside the array
> buffer and introduces the possibility of use-after-free and dangling
> reference errors that are impossible if the array does not use embedded
> references, so that’s the main reason it hasn’t been done much.
>
> On Wed, Mar 13, 2024 at 8:17 AM Dom Grigonis 
> wrote:
>
>> Hi all,
>>
>> Say python’s builtin `int` type. It can be as large as memory allows.
>>
>> np.ndarray on the other hand is optimized for vectorization via strides,
>> memory structure and many things that I probably don’t know. Well the point
>> is that it is convenient and efficient to use for many things in comparison
>> to python’s built-in list of integers.
>>
>> So, I am thinking whether something in between exists? (And obviously
>> something more clever than np.array(dtype=object))
>>
>> Probably something similar to `StringDType`, but for integers and
>> floats. (It’s just my guess. I don’t know anything about `StringDType`,
>> but just guessing it must be better than np.array(dtype=object) in
>> combination with np.vectorize)
>>
>> Regards,
>> dgpb
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: nathan12...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: dtype=object arrays not treated like Python list?

2024-03-29 Thread Nathan
On Fri, Mar 29, 2024 at 8:39 AM Jim Pivarski  wrote:

> On Fri, Mar 29, 2024 at 8:07 AM Steven G. Johnson  wrote:
>
>> Should a dtype=object array be treated more like Python lists for type
>> detection/coercion reasons?   Currently, they are treated quite differently:
>> >>> np.isfinite([1,2,3])
>> array([ True,  True,  True])
>>
>
> In this case, the `[1, 2, 3]` is being converted into an array of integers
> before it is acted upon by `np.isfinite`. The function needs an array and
> the list-to-array coercion selects a reasonably narrow dtype based on the
> values in the list. In principle, the dtype of the list of Python `int`
> values could be `object`, `np.int64`, `np.int32`, or maybe even a floating
> point type (because integers are a subset of the reals), but `np.int64` is
> reasonable. (Although on Windows, the choice is `np.int32`!)
>

This is a good answer to the original question, but just want to offer the
small clarification that this changes in NumPy 2.0, the default integer on
64 bit windows is now int64.


> When the `np.isfinite` function is given an array, which already has a
> specified dtype, it takes that dtype seriously and doesn't try to change
> it. So, given `np.array([1, 2, 3], dtype=object)`, no coercion is
> attempted, and `np.isfinite` can't operate on arrays with that dtype.
>
> The difference with respect to Julia is that types are more of a focal
> point of user attention than in Python. NumPy array types are a focal point
> of user attention, but Python types are less so. `Any[]` is a good analogue
> of NumPy arrays with `dtype=object` in the sense that functions don't try
> to downcast them to the narrowest type supported by their values, but there
> isn't a good Julia analogue of a Python list, which is fair game for such
> downcasting. A Python user, like me, *wants* NumPy to figure out that if
> I pass `[1, 2, 3]` to a function that needs an array, it should cast it as
> an array of integers—at least when I'm in the hacking stage of developing a
> project or interactively trying things out in a terminal or Jupyter. Later,
> to have more control over a project that's getting more mature, I'll be
> explicit about types and explicitly cast them as NumPy arrays with
> specified dtypes. (For example, to ensure that integer types have the same
> bit width on all platforms.) Since Julia uses types to determine which
> implementation of a function to run, Julia can't be this loose about types
> at any stage of development: invoking a different method-overload of a
> function is not a small, gradual change.
>
> For the interface between Python and Julia, then, there are going to be
> some hard decisions to make. Not every type on one side has a good
> equivalent on the other. Personally, I would consider it reasonable for all
> Julia AbstractArrays to transfer to Python as NumPy arrays—none of them as
> Python lists. The transformation can't be bijective because Python lists
> would transfer to Julia as `Any[]`. Users of both languages would have to
> be aware that the conversion doesn't round-trip.
>
> -- Jim
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Moving the weekly traige/community meetings

2024-04-08 Thread Nathan
That time work for me, I have a conflict with the old time an hour earlier
than the current time so hopefully that works for everyone.

On Sun, Apr 7, 2024 at 8:34 PM Matti Picus  wrote:

> Could we move the weekly community/triage meetings one hour later? Some
> participants have a permanent conflict, and the current time is
> inconvenient for my current time zone.
>
> Matti
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: PR - can I get a new review?

2024-05-07 Thread Nathan
I think most of the build failures you’re seeing would be fixed by merging
with or rebasing on the latest main branch.

Note that there is currently an issue with some of the windows CI runners,
so you’ll see failures related to our spin configuration failing to handle
a gcov argument that was added in spin 0.9 released a couple days ago.

On Mon, May 6, 2024 at 8:48 PM Jake S.  wrote:

> Hi community,
>
> PR 26081  is about making
> numpy's ShapeType covariant and bound to a tuple of ints.  The community
> has requested this occasionally in issue 16544
> .  I'm reaching out via the
> listserv because it's been a few months, and I don't want it to get too
> stale.  I could really use some help pushing it over the finish line.
>
> Summary:
> Two numpy reviewers and one interested community member reviewed the PR
> and asked for a type alias akin to npt.NDArray that allowed shape.  I
> worked through the issues with TypeVarTuple and made npt.Array, and it was
> fragile, but passing CI.  After a few months passed, I returned to fix the
> fragility in the hopes of getting some more attention, but now it fails CI
> in some odd builds (passes the mypy bit).  I have no idea how to get these
> to pass, as they appear unrelated to anything I've worked on (OpenBLAS on
> windows, freeBSD...?).
>
> Thanks,
> Jake
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add context management for np.seterr

2024-05-09 Thread Nathan
I think you're looking for the errstate context manager:

https://numpy.org/doc/stable/reference/generated/numpy.errstate.html

On Thu, May 9, 2024 at 1:11 PM  wrote:

> The current way (according to 1.26 doc) of setting and resetting error is
> ```
> old_settings = np.seterr(all='ignore')  #seterr to known value
> np.seterr(over='raise')
> {'divide': 'ignore', 'over': 'ignore', 'under': 'ignore', 'invalid':
> 'ignore'}
> np.seterr(**old_settings)  # reset to default
> ```
> This may be tedious and not elegant when we need to suppress the error for
> some certain lines, for example, `np.nan_to_num(a/b) ` as we need to
> suppress divide here.
>
> I think it would be way more elegant to use `with` statement here, which
> should be able to be implemented with some simple changes.
>
> An ideal result would be like:
> ```
> with np.seterr(divide='ignore'):
> np.nan_to_num(a/b) # no warning
>
> a/0 # still warn
> ```
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fastpathing lexsort for integers

2024-05-11 Thread Nathan
Sorry for not responding without prompting. A PR is indeed a better venue
to ask questions about a proposed code change than a mailing list post.

Adding a keyword argument to trigger a fast path seems like a bad python
API to me, since most users won’t notice it. “kind” seems nicer in that
it’s more general, but it would be even better to have some kind of
heuristic to choose the fast path when appropriate, although it sounds like
that’s not possible? Are there cases where the int path is a pessimisation?

It seems to me like it would be more natural to alter the C code as you’re
implying, but I think there’s some confusion about which C function you
need. You probably shouldn’t touch the public C API (PyArray_LexSort is in
the public API). The function that is actually being called in that python
file is a wrapper for the C API function:

https://github.com/numpy/numpy/blob/1e5386334b6f9508964fcd2e1c30293a9d82f026/numpy/_core/src/multiarray/multiarraymodule.c#L3446

So, rather than putting your int fast path in python, you’d implement in C
in that file, adding the new “kind” keyword or some sort of heuristic to
trigger it to array_lexsort in C.

If it’s possible to use a heuristic rather than requiring users to opt in,
then it could make sense to update PyArray_LexSort, but changing public C
APIs is much more disruptive in C than in python, so we generally don’t do
it and make python-level optimizations possible in C by adding new
non-public C functions that python can call using private APIs like
_multiarray_umath.

Obviously writing CPython C API code is a lot less straightforward than
Python, but the numpy code reviewers have a lot of experience spotting C
API issues and we can point you to resources for learning.

Hope that helps,

Nathan

On Sat, May 11, 2024 at 4:35 PM  wrote:

> Any feedback, even just on where to locate the Python code I wrote?
>
> Otherwise I will try to just open a PR and see how it goes.
>
> Thanks,
>
> Pietro
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Question regarding transitioning native extension to NumPy 2.0

2024-05-24 Thread Nathan
Hi,

The issue is caused by improperly importing the numpy C API. If you apply
this diff, it will work:

diff --git a/_aux.c b/_aux.c
index e3f8f32..435b612 100644
--- a/_aux.c
+++ b/_aux.c
@@ -1,4 +1,6 @@
 #include "Python.h"
+#define NO_IMPORT_ARRAY
+#define PY_ARRAY_UNIQUE_SYMBOL ExtModule
 #include "numpy/arrayobject.h"
 #include "_aux.h"

diff --git a/ext.c b/ext.c
index 65ad2c2..0e8eb3e 100644
--- a/ext.c
+++ b/ext.c
@@ -1,5 +1,6 @@
 #define PY_SSIZE_T_CLEAN
 #include "Python.h"
+#define PY_ARRAY_UNIQUE_SYMBOL ExtModule
 #include "numpy/arrayobject.h"
 #include "_aux.h"

See also this new docs page, which hopefully clarifies this sort of arcane
point:
https://numpy.org/devdocs/reference/c-api/array.html#including-and-importing-the-c-api

We were a bit loose in what we allowed before, effectively leaking
details of the numpy C API. We cleaned that up, but that means C extensions
now need to do this import dance correctly.

Hope that helps,

Nathan

On Fri, May 24, 2024 at 12:52 PM Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:

> I am working to transition mkl_fft and mkl_random to NumPy 2.0.
>
> Both of these projects contain native extensions.
>
>
>
> I have distilled unexpected behavior behind observed test failures in
> minimal C extension:
>
>
>
> https://github.com/oleksandr-pavlyk/reproducer-for-question-about-numpy2
>
>
>
> The extension defines a single Python function which does expects
> numpy.ndarray, and queries its itemsize in two ways
>
>
>
>1. By  calling C function declared in “_aux.h”  defined in “_aux.c” to
>call PyArray_ITEMSIZE and return the result
>2. By calling PyArray_ITEMSIZE directly
>
>
>
> https://github.com/oleksandr-pavlyk/reproducer-for-question-about-numpy2/blob/main/ext.c#L19-L22
>
>
>
> The result obtained by calling C function is always 0, while direct call
> gives the correct result.
>
>
>
> I am hoping for advice about what is wrong and how to fix it.
>
>
>
> Thank you,
> Sasha
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] numpy-user-dtypes updated for NumPy 2.0

2024-07-16 Thread Nathan
Hi all,

I just pushed some commits to the numpy-user-dtypes repo (
https://github.com/numpy/numpy-user-dtypes) that fixes compatibility with
the public version of the DType API that shipped in NumPy 2.0. If you’ve
been waiting for some runnable examples to look at before trying to write
your own DType, wait no more!

Also Swayam Singh, an intern at Quansight Labs this summer, will be working
on an extended precision float DType in the user dtypes repo to hopefully
allow a migration path for users of the NumPy 80 bit long double DType so
NumPy can eventually deprecate and remove it.

If you’re curious about any of that please subscribe to the repo and open
issues or respond here.

Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: numpy-user-dtypes updated for NumPy 2.0

2024-07-17 Thread Nathan
On Wed, Jul 17, 2024 at 1:26 PM Stefan van der Walt 
wrote:

> Hi Nathan,
>
> On Tue, Jul 16, 2024, at 12:24, Nathan wrote:
>
> I just pushed some commits to the numpy-user-dtypes repo (
> https://github.com/numpy/numpy-user-dtypes) that fixes compatibility with
> the public version of the DType API that shipped in NumPy 2.0. If you’ve
> been waiting for some runnable examples to look at before trying to write
> your own DType, wait no more!
>
>
> Thanks for working on these!
>
> I see some dtypes have build instructions, and some don't. Does it make
> sense to add generic build instructions to the repo README, or otherwise at
> least ensure that each has a well described install & activation mechanism?
>

It definitely makes sense. I also added a “reinstall.sh” script I found
useful for local development to each one but adding some basic docs should
happen too.


>
> Stéfan
>
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: git status

2024-08-05 Thread Nathan
Why are you rebasing after fetching? You provably don’t want to rebase
what’s supposed to be a public branch on top of a public commit like that.

To make sure your fork and upstream numpy have the same main branch, do:

git fetch numpy main
git checkout numpy/main
git branch -f main
git checkout main
git push -f origin main

Now your old “main” branch is lost. I’m assuming you don’t care about it.
Leave behind a branch by doing “git checkout -b main-backup” before doing
“branch -f” if you do care about it.

On Sun, Aug 4, 2024 at 10:16 PM Andrew Nelson  wrote:

> My git config is:
>
>
> [remote "origin"]
>
> url = https://github.com/andyfaff/numpy.git
>
> fetch = +refs/heads/*:refs/remotes/origin/*
>
> [remote "numpy"]
>
> url = https://github.com/numpy/numpy.git
>
> fetch = +refs/heads/*:refs/remotes/numpy/*
>
> [branch "main"]
>
> remote = numpy
>
> merge = refs/heads/main
>
>
> The main issue seemed to be with my fork on github. The button for
> updating/syncing wasn't usable as there was a conflict. It wasn't apparent
> what the conflict was. This conflict was also preventing me from pushing to
> origin/main.
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Uniform interface for accessing minimum or maximum value of a dtype

2024-08-26 Thread Nathan
That seems reasonable to me on its face. There are some corner cases to
work out though.

Swayam is tinkering with a quad precision dtype written using rhe new DType
API and just ran into the fact that finfo doesn’t support user dtypes:

https://github.com/numpy/numpy/issues/27231

IMO any new feature along these lines should have some thought in the
design about how to handle user-defined data types.

Another thing to consider is that data types can be non-numeric (things
like categories) or number-like but not really just a number like a
quantity with a physical unit.  That means you should also think about what
to do where fields like min and max don’t make any sense or need to be a
generic python object rather than a numeric type.

I think if someone proposed a NEP that fully worked this out it would be
welcome. My understanding is that the array API consortium prefers to
standardize on APIs that gain tractions in libraries rather than inventing
APIs and telling libraries to adopt them, so I think a NEP is the right
first step, rather than trying to standardize something in the array API.

On Mon, Aug 26, 2024 at 8:06 AM Lucas Colley 
wrote:

> Or how about `np.dtype_info(dt)`, which could return an object with
> attributes like `min` and `max`. Would that be possible?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Nathan
On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith  wrote:

> On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers 
> wrote:
> >
> >
> >
> > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith  wrote:
> >>
> >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers 
> wrote:
> >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith 
> wrote:
> >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <
> einstein.edi...@gmail.com> wrote:
> >> >> > The fact that we're having to design more and more protocols for a
> lot
> >> >> > of very similar things is, to me, an indicator that we do have
> holistic
> >> >> > problems that ought to be solved by a single protocol.
> >> >>
> >> >> But the reason we've had trouble designing these protocols is that
> >> >> they're each different :-). If it was just a matter of copying
> >> >> __array_ufunc__ we'd have been done in a few minutes...
> >> >
> >> > I don't think that argument is correct. That we now have two very
> similar protocols is simply a matter of history and limited developer time.
> NEP 18 discusses in several places that __array_ufunc__ should be brought
> in line with __array_ufunc__, and that we can migrate a function from one
> protocol to the other. There's no technical reason other than backwards
> compat and dev time why we couldn't use __array_function__ for ufuncs also.
> >>
> >> Huh, that's interesting! Apparently we have a profoundly different
> >> understanding of what we're doing here.
> >
> >
> > That is interesting indeed. We should figure this out first - no point
> discussing a NEP about plugging the gaps in our override system when we
> don't have a common understanding of why we wanted/needed an override
> system in the first place.
> >
> >> To me, __array_ufunc__ and
> >> __array_function__ are completely different. In fact I'd say
> >> __array_ufunc__ is a good idea and __array_function__ is a bad idea,
> >
> >
> > It's early days, but "customer feedback" certainly has been more
> enthusiastic for __array_function__. Also from what I've seen so far it
> works well. Example: at the SciPy sprints someone put together Xarray plus
> pydata/sparse to use distributed sparse arrays for visualizing some large
> genetic (I think) data sets. That was made to work in a single day, with
> impressively little code.
>
> Yeah, it's true, and __array_function__ made a bunch of stuff that
> used to be impossible become possible, I'm not saying it didn't. My
> prediction is that the longer we live with it, the more limits we'll
> hit and the more problems we'll have with long-term maintainability. I
> don't think initial enthusiasm is a good predictor of that either way.
>
> >> The key difference is that __array_ufunc__ allows for *generic*
> >> implementations.
> >
> > Implementations of what?
>
> Generic in the sense that you can write __array_ufunc__ once and have
> it work for all ufuncs.
>
> >> Most duck array libraries can write a single
> >> implementation of __array_ufunc__ that works for *all* ufuncs, even
> >> new third-party ufuncs that the duck array library has never heard of,
> >
> >
> > I see where you're going with this. You are thinking of reusing the
> ufunc implementation to do a computation. That's a minor use case (imho),
> and I can't remember seeing it used.
>
> I mean, I just looked at dask and xarray, and they're both doing
> exactly what I said, right now in shipping code. What use cases are
> you targeting here if you consider dask and xarray out-of-scope? :-)
>
> > this is case where knowing if something is a ufunc helps use a property
> of it. so there the more specialized nature of __array_ufunc__ helps. Seems
> niche though, and could probably also be done by checking if a function is
> an instance of np.ufunc via __array_function__
>
> Sparse arrays aren't very niche... and the isinstance trick is
> possible in some cases, but (a) it's relying on an undocumented
> implementation detail of __array_function__; according to
> __array_function__'s API contract, you could just as easily get passed
> the ufunc's __call__ method instead of the object itself, and (b) it
> doesn't work at all for ufunc methods like reduce, outer, accumulate.
> These are both show-stoppers IMO.
>
> > This last point, using third-party ufuncs, is the interesting one to me.
> They have to be generated with the NumPy ufunc machinery, so the dispatch
> mechanism is attached to them. We could do third party functions for
> __array_function__ too, but that would require making
> @array_function_dispatch public, which we haven't done (yet?).
>
> With __array_function__ it's theoretically possible to do the dispatch
> on third-party functions, but when someone defines a new function they
> always have to go update all the duck array libraries to hard-code in
> some special knowledge of their new function. So in my example, even
> if we made @array_function_dispatch public, you still couldn't use
> your nice new numba-created gufunc unless you first convinced dask,
> xarray, and bcolz to al

Re: [Numpy-discussion] Py-API: Deprecate `np.dtype(np.floating)` and similar dtype creation

2020-02-14 Thread Nathan
For what it's worth, github search only finds two instances of this usage:

https://github.com/search?q=%22np.dtype%28np.floating%29%22&type=Code

On Fri, Feb 14, 2020 at 2:28 PM Sebastian Berg 
wrote:

> Hi all,
>
> In https://github.com/numpy/numpy/pull/15534 I would like to start
> deprecating creating dtypes from "abstract" scalar classes, such as:
>
> np.dtype(np.floating) is np.dtype(np.float64)
>
> While, at the same time, `isinstance(np.float32, np.floating)` is true.
>
> Right now `arr.astype(np.floating, copy=False)` and, more obviously,
> `arr.astype(np.dtype(np.floating), copy=False)` will cast a float32
> array to float64.
>
> I think we should deprecate this, to consistently enable that in the
> future `dtype=np.floating` may choose to not cast a float32 array. Of
> course for the `astype` call the DeprecationWarning would be changed to
> a FutureWarning before we change the result value.
>
> A slight (but hopefully rare) annoyance is that `np.integer` might be
> used since it reads fairly well compared to `np.int_`. The large
> upstream packages such as SciPy or astropy seem to be clean in this
> regard, though (at least almost clean).
>
> Does anyone think this is a bad idea? To me these deprecations seem
> fairly straight forward, possibly flush out bugs/unintended behaviour,
> and necessary for consistent future behaviour. (More similar ones may
> have to follow).
>
> If there is some, but not much, hesitation, I can also add this to the
> NEP 41 draft. Although I currently feel it is the right thing to do
> even if we never had any new dtypes.
>
> - Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Py-API: Deprecate `np.dtype(np.floating)` and similar dtype creation

2020-02-14 Thread Nathan
Yeah, that seems to be more popular:

https://github.com/search?q=%22dtype%3Dnp.integer%22&type=Code

On Fri, Feb 14, 2020 at 2:45 PM Sebastian Berg 
wrote:

> On Fri, 2020-02-14 at 14:39 -0700, Nathan wrote:
> > For what it's worth, github search only finds two instances of this
> > usage:
> >
> > https://github.com/search?q=%22np.dtype%28np.floating%29%22&type=Code
> >
>
> In most common thing I would expect to be `dtype=np.integer` (possibly
> without the `dtype` as a positional argument).
> The call your search finds is nice because it must delete `np.dtype`
> call.
> As is, it is doing the incorrect thing so the deprecation would flush
> out a bug.
>
> - Sebastian
>
>
> > On Fri, Feb 14, 2020 at 2:28 PM Sebastian Berg <
> > sebast...@sipsolutions.net> wrote:
> > > Hi all,
> > >
> > > In https://github.com/numpy/numpy/pull/15534 I would like to start
> > > deprecating creating dtypes from "abstract" scalar classes, such
> > > as:
> > >
> > > np.dtype(np.floating) is np.dtype(np.float64)
> > >
> > > While, at the same time, `isinstance(np.float32, np.floating)` is
> > > true.
> > >
> > > Right now `arr.astype(np.floating, copy=False)` and, more
> > > obviously,
> > > `arr.astype(np.dtype(np.floating), copy=False)` will cast a float32
> > > array to float64.
> > >
> > > I think we should deprecate this, to consistently enable that in
> > > the
> > > future `dtype=np.floating` may choose to not cast a float32 array.
> > > Of
> > > course for the `astype` call the DeprecationWarning would be
> > > changed to
> > > a FutureWarning before we change the result value.
> > >
> > > A slight (but hopefully rare) annoyance is that `np.integer` might
> > > be
> > > used since it reads fairly well compared to `np.int_`. The large
> > > upstream packages such as SciPy or astropy seem to be clean in this
> > > regard, though (at least almost clean).
> > >
> > > Does anyone think this is a bad idea? To me these deprecations seem
> > > fairly straight forward, possibly flush out bugs/unintended
> > > behaviour,
> > > and necessary for consistent future behaviour. (More similar ones
> > > may
> > > have to follow).
> > >
> > > If there is some, but not much, hesitation, I can also add this to
> > > the
> > > NEP 41 draft. Although I currently feel it is the right thing to do
> > > even if we never had any new dtypes.
> > >
> > > - Sebastian
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] New API for testing custom array containers

2022-11-04 Thread Nathan
Hi all,

I just opened a PR (https://github.com/numpy/numpy/pull/22533) that makes a
minor modification to the numpy API. The PR creates
numpy.testing.overrides, which contains some helpers for downstream
projects who want to test types that implement __array_function__ and
__array_ufunc__. See the discussion in
https://github.com/numpy/numpy/issues/15544 for more context.

-Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing `np.types`

2023-02-10 Thread Nathan
On Fri, Feb 10, 2023 at 3:31 AM Sebastian Berg 
wrote:

> Hi all,
>
> I was wondering if we should introduce a new `np.types` namespace.  The
> main reason is that we have the DType classes, that most users don't
> need to worry about.  These mirror the scalar classes, but getting them
> is weird currently.
>
> I never wanted to put these in the top-level (because I feel they
> probably won't be used much day to day).  That would be thing like:
>
> * np.types.IntDType, np.types.Int64DType  (or maybe without dtype)
> * np.types.NumberDType  (an abstract DType)
> * np.types.InexactDType
> * ...
> * np.types.DTypeMeta  (the metaclass used, just to have it somewhere)
>
>

> Maybe there are some more types that we could use a public entrypoint
> for  (e.g. the type used by array-function dispatched functions or
> `np.ufunc` could in principle move also).
>

Small bikeshed: the name np.types indicates to me that it has something to
do with static typing. If this namespace only includes dtype classes, then
np.dtype_classes is a more natural name. If it includes things like
`np.ufunc` then that's not as clear, and I don't have a better idea offhand
than np.types.

>
> What do you think?  I don't really have a good idea for an alternative
> but at some point not making these nicely public is not great...
>

Related to your proposal but not orthogonal to it, I still think it would
still be nice to be able to do things like:

>>> np.dtype[numbers.Number]
np.types.NumberDType

I know that currently __class_getitem__ is used by the typing support, but
I think the typing will still work if you also got back a usable dtype
instance at runtime instead of a GenericAlias, which has a confusing repr
and is not useful at runtime.


> (I will note that the DType classes do get printed sometimes in error
> messages.)
>

See also https://github.com/numpy/numpy/issues/22920.


>
> Cheers,
>
> Sebastian
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing `np.types`

2023-02-13 Thread Nathan
On Mon, Feb 13, 2023 at 4:34 AM Sebastian Berg 
wrote:

> On Sat, 2023-02-11 at 11:24 +, Ralf Gommers wrote:
> > On Fri, Feb 10, 2023 at 5:35 PM Nathan 
> > wrote:
> >
> 
>
> > > >
> > >
> > > Small bikeshed: the name np.types indicates to me that it has
> > > something to
> > > do with static typing. If this namespace only includes dtype
> > > classes, then
> > > np.dtype_classes is a more natural name. If it includes things like
> > > `np.ufunc` then that's not as clear, and I don't have a better idea
> > > offhand
> > > than np.types.
> > >
> >
> > I had the same concern. It would be good to have a full list of
> > things to
> > put in this and the scope of it. There's other things that could fit,
> > like
> > custom warning and exception classes. We just added
> > `numpy.exceptions` for
> > the next release, but that could still be moved. Having separate
> > submodules
> > for type classes, for exceptions, for array-function override related
> > stuff, etc. is not great. Enums could fit as well, they're now
> > polluting
> > the main namespace. It seems like we need one namespace with all that
> > kind
> > of stuff that the average end user won't need but that becomes
> > useful/important when you're doing something quite custom or are
> > writing a
> > library on top of numpy.
>
>
> Well, right now I care mainly about dtype related types.   Other
> possible ones might be:
> * np.ufunc  (just because I doubt it is used much at all)
> * the array-function dispatcher  (private right now)
> * DTypeMeta (dtype related, though)
>
> But, not sure we have many more.  I don't think I mind `exceptions` to
> be a separate namespace it seems nice and clear?  For the rest, either
> a dtype specific solution or not is fine by me.
>
> I suggested types because Python has `types` as a catch-all for builtin
> types (mainly function, generator, ...) that are not normally used
> directly and it seemed relatively clear and concise.
> (In Python I don't think it has anything to do with typing directly.)


Ah, good analogy! I didn’t think of that.

I think moving to the future where dtype classes have sensible names and
reprs, and can be accesses straightforwardly at runtime without doing
type(dtype_instance) is a real improvement and thank you for pushing on
this. I don’t think how it’s spelled is as important.

>

>
> - Sebastian
>
>
> >
> > Cheers,
> > Ralf
> >
> >
> >
> > >
> > > > What do you think?  I don't really have a good idea for an
> > > > alternative
> > > > but at some point not making these nicely public is not great...
> > > >
> > >
> > > Related to your proposal but not orthogonal to it, I still think it
> > > would
> > > still be nice to be able to do things like:
> > >
> > > >>> np.dtype[numbers.Number]
> > > np.types.NumberDType
> > >
> > > I know that currently __class_getitem__ is used by the typing
> > > support, but
> > > I think the typing will still work if you also got back a usable
> > > dtype
> > > instance at runtime instead of a GenericAlias, which has a
> > > confusing repr
> > > and is not useful at runtime.
> > >
> > >
> > > > (I will note that the DType classes do get printed sometimes in
> > > > error
> > > > messages.)
> > > >
> > >
> > > See also https://github.com/numpy/numpy/issues/22920.
> > >
> > >
> > > >
> > > > Cheers,
> > > >
> > > > Sebastian
> > > >
> > > > ___
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: nathan12...@gmail.com
> > > >
> > > ___
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: ralf.gomm...@googlemail.com
> > >
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Welcome Nathan Goldbaum as a Maintainer

2023-05-30 Thread Nathan
Thanks so much everyone!

On Mon, May 29, 2023 at 9:00 AM Ross Barnowski  wrote:

> Welcome Nathan!
>
> On Mon, May 29, 2023 at 7:47 AM Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Mon, May 29, 2023 at 1:15 AM Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>
>>> Hi all,
>>>
>>> On behalf of the steering council, I am very happy to announce that
>>> Nathan has joined us as a Maintainer!
>>>
>>> Nathan has been consistently contributing and reviewing NumPy PRs for a
>>> while and is for example actively working on a better string DType
>>> which often means diving into the NumPy core as needed.
>>>
>>> Looking forward to working more with you!
>>>
>>> Cheers,
>>>
>>> Sebastian
>>>
>>>
>> Welcome aboard Nathan.
>>
>> Chuck
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: ross...@berkeley.edu
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] New user dtypes and the buffer protocol

2023-07-06 Thread Nathan
Hi all,

As you may know, I'm currently working on a variable-width string dtype
using the new experimental user dtype API. As part of this work I'm running
into papercuts that future dtype authors will likely hit and I've been
trying to fix them as I go.

One issue I'd like to raise with the list is that the Python buffer
protocol and the `__array_interface__` protocol support a limited set of
data types.

This leads to three concrete issues I'm working around:

   * The `npy` file format uses the type strings defined by the
`__array_interface__` protocol, so any type that doesn't have a type string
defined in that protocol cannot currently be saved [1].

* Cython uses the buffer protocol in its support for numpy arrays and
in the typed memoryview interface so that means any array with a dtype that
doesn't support the buffer protocol cannot be accessed using idiomatic
cython code [2]. The same issue means cython can't easily support float16
or datetime dtypes [3].

* Currently new dtypes don't have a way to export a string version of
themselves that numpy can subsequently load (implicitly importing the
dtype). This makes it more awkward to update downstream libraries that
currently treat dtypes as strings.

One way to fix this is to define an ad-hoc extension to the buffer
protocol. Officially, the buffer protocol only supports the format codes
used in the struct module [4]. Unofficially, memoryview doesn't raise a
NotImplementedError if you pass it an invalid format code, only raising an
error when it tries to access the data. This means we can stuff an
arbitrary string into the format code. See the proposal from Sebastian on
the Python Discuss forum [5] and his proof-of-concept [6]. The hardest
issue with this approach is that it's a social problem, requiring
cross-project coordination with at least Cython, and possibly a PEP to
standardize whatever extension to the buffer protocol we come up with.

Another option would be to exchange data using the arrow data format [7],
which already supports many of the kinds of memory layouts custom dtype
authors might want to use and supports defining custom data types [8]. The
big issue here is that NumPy probably can't depend on the arrow C++ library
(I think?) so we would need to write a bunch of code to support arrow data
layouts and data types, but then we would also need to do the same thing on
the Cython side.

Implementing either of these approaches fixes the issues I enumerated above
at the cost of some added complexity. We don't necessarily have to make an
immediate decision for my work to be viable, I can work around most of
these issues, but I think now is probably the time to raise this as an
issue and see if anyone has strong opinions about what NumPy should
ultimately do.

I've raised this on the Cython mailing list to get their take as well [9].

[1] https://github.com/numpy/numpy/issues/24110
[2] https://github.com/numpy/numpy/issues/18442
[3] https://github.com/numpy/numpy/issues/4983
[4] https://docs.python.org/3/library/struct.html#format-strings
[5]
https://discuss.python.org/t/buffer-protocol-and-arbitrary-data-types/26256
[6] https://github.com/numpy/numpy/issues/23500#issuecomment-1525103546
[7] https://arrow.apache.org/docs/format/Columnar.html
[8] https://arrow.apache.org/docs/format/Columnar.html#extension-types
[9] https://mail.python.org/pipermail/cython-devel/2023-July/005434.html
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Nathan
This has come up before, see https://github.com/numpy/numpy/issues/6044 for
the first time this came up; there were several subsequent discussions
linked there.

In the meantime, the data APIs consortium has been actively working on
adding a `cumulative_sum` function to the array API standard, see
https://github.com/data-apis/array-api/issues/597 and
https://github.com/data-apis/array-api/pull/653. The proposed
`cumulative_sum` function includes an `include_initial` keyword argument
that gets the OP's desired behavior.

I think we should probably eventually deprecate `cumsum` and `cumprod` in
favor of the array API standard's `cumulative_sum` and `cumulative_product`
if only because of the embarrassing naming issue. Once the array API
standard has finalized the name for the keyword argument, I think it makes
sense to add the keyword argument to np.cumsum, even if we don't deprecate
it yet. I don't think it makes sense to add a new function just for this.

On Fri, Aug 11, 2023 at 6:34 AM  wrote:

> `cumsum` computes the sum of the first k summands for every k from 1.
> Judging by my experience, it is more often useful to compute the sum of the
> first k summands for every k from 0, as `cumsum`'s behaviour leads to
> fencepost-like problems.
> https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> For example, `cumsum` is not the inverse of `diff`. I propose adding a
> function to NumPy to compute cumulative sums beginning with 0, that is, an
> inverse of `diff`. It might be called `cumsum0`. The following code is
> probably not the best way to implement it, but it illustrates the desired
> behaviour.
>
> ```
> def cumsum0(a, axis=None, dtype=None, out=None):
> """
> Return the cumulative sum of the elements along a given axis,
> beginning with 0.
>
> cumsum0 does the same as cumsum except that cumsum computes the sum
> of the first k summands for every k from 1 and cumsum, from 0.
>
> Parameters
> --
> a : array_like
> Input array.
> axis : int, optional
> Axis along which the cumulative sum is computed. The default
> (None) is to compute the cumulative sum over the flattened
> array.
> dtype : dtype, optional
> Type of the returned array and of the accumulator in which the
> elements are summed. If `dtype` is not specified, it defaults to
> the dtype of `a`, unless `a` has an integer dtype with a
> precision less than that of the default platform integer. In
> that case, the default platform integer is used.
> out : ndarray, optional
> Alternative output array in which to place the result. It must
> have the same shape and buffer length as the expected output but
> the type will be cast if necessary. See
> :ref:`ufuncs-output-type` for more details.
>
> Returns
> ---
> cumsum0_along_axis : ndarray.
> A new array holding the result is returned unless `out` is
> specified, in which case a reference to `out` is returned. If
> `axis` is not None the result has the same shape as `a` except
> along `axis`, where the dimension is smaller by 1.
>
> See Also
> 
> cumsum : Cumulatively sum array elements, beginning with the first.
> sum : Sum array elements.
> trapz : Integration of array values using the composite trapezoidal
> rule.
> diff : Calculate the n-th discrete difference along given axis.
>
> Notes
> -
> Arithmetic is modular when using integer types, and no error is
> raised on overflow.
>
> ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
> values since ``sum`` may use a pairwise summation routine, reducing
> the roundoff-error. See `sum` for more information.
>
> Examples
> 
> >>> a = np.array([[1, 2, 3], [4, 5, 6]])
> >>> a
> array([[1, 2, 3],
>[4, 5, 6]])
> >>> np.cumsum0(a)
> array([ 0,  1,  3,  6, 10, 15, 21])
> >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
> array([ 0.,  1.,  3.,  6., 10., 15., 21.])
>
> >>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
> array([[0, 0, 0],
>[1, 2, 3],
>[5, 7, 9]])
> >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
> array([[ 0,  1,  3,  6],
>[ 0,  4,  9, 15]])
>
> ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
>
> >>> b = np.array([1, 2e-9, 3e-9] * 100)
> >>> np.cumsum0(b)[-1]
> 100.0050045159
> >>> b.sum()
> 100.005029
>
> """
> empty = a.take([], axis=axis)
> zero = empty.sum(axis, dtype=dtype, keepdims=True)
> later_cumsum = a.cumsum(axis, dtype=dtype)
> return concatenate([zero, later_cumsum], axis=axis, dtype=dtype,
> out=out)
> ```
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.

Re: [Numpy-discussion] __array_ufunc__ counting down to launch, T-24 hrs.

2017-03-31 Thread Nathan Goldbaum
Thanks for linking to the updated NEP, I've been looking for a good
overview of this discussion. Up until now I haven't wanted to wade through
the extensive discussion on this topic.

I'm curious, if I want to simultaneously support older Numpy versions as
well as newer versions, will I be able to leave implementations of
__array_wrap__ and __array_prepare__ defined alongside __array_ufunc__?
Optimally in such a way that older numpy versions use __array_wrap__ and
newer versions only use __array_ufunc__.

There isn't discussion about this in the NEP, but does this also have
impacts on non-ufunc numpy operations like concatenate, dot, norm, hstack,
and others? We currently make use of wrappers around those functions in yt
but unfortunately they have poor discoverability for users, it would be
nice if NumPy could do the right thing with nearest subclasses.

On Fri, Mar 31, 2017 at 12:04 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi All,
>
> Following Nathaniel's request, I have made a PR that changes the
> original NEP to describe the current implementation.
> * PR at https://github.com/charris/numpy/pull/9
> * Rendered relevant page at
> http://www.astro.utoronto.ca/~mhvk/numpy-doc/neps/ufunc-overrides.html
> It may still be somewhat short on detail, but should now give the
> rationale for what we want to implement.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy v1.13.0rc1 released.

2017-05-10 Thread Nathan Goldbaum
Hi Chuck,

Is there a docs build for this release somewhere? I'd like to find an
authoritative reference about __array_ufunc__, which I'd hesistated on
looking into until now for fear about the API changing.

Nathan

On Wed, May 10, 2017 at 8:49 PM Charles R Harris 
wrote:

> Hi All,
>
> I'm please to announce the NumPy 1.13.0rc1 release. This release supports
> Python 2.7 and 3.4-3.6 and contains many new features. It is one of the
> most ambitious releases in the last several years. Some of the highlights
> and new functions are
>
> *Highlights*
>
>- Operations like ``a + b + c`` will reuse temporaries on some
>platforms, resulting in less memory use and faster execution.
>- Inplace operations check if inputs overlap outputs and create
>temporaries to avoid problems.
>- New __array_ufunc__ attribute provides improved ability for classes
>to override default ufunc behavior.
>-  New np.block function for creating blocked arrays.
>
>
> *New functions*
>
>- New ``np.positive`` ufunc.
>- New ``np.divmod`` ufunc provides more efficient divmod.
>- New ``np.isnat`` ufunc tests for NaT special values.
>- New ``np.heaviside`` ufunc computes the Heaviside function.
>- New ``np.isin`` function, improves on ``in1d``.
>- New ``np.block`` function for creating blocked arrays.
>- New ``PyArray_MapIterArrayCopyIfOverlap`` added to NumPy C-API.
>
> Wheels for the pre-release are available on PyPI. Source tarballs,
> zipfiles, release notes, and the Changelog are available on github
> <https://github.com/numpy/numpy/releases/tag/v1.13.0rc1>.
>
> A total of 100 people contributed to this release.  People with a "+" by
> their
> names contributed a patch for the first time.
>
>- A. Jesse Jiryu Davis +
>- Alessandro Pietro Bardelli +
>- Alex Rothberg +
>- Alexander Shadchin
>- Allan Haldane
>- Andres Guzman-Ballen +
>- Antoine Pitrou
>- Antony Lee
>- B R S Recht +
>- Baurzhan Muftakhidinov +
>- Ben Rowland
>- Benda Xu +
>- Blake Griffith
>- Bradley Wogsland +
>- Brandon Carter +
>- CJ Carey
>- Charles Harris
>- Danny Hermes +
>- Duke Vijitbenjaronk +
>- Egor Klenin +
>- Elliott Forney +
>- Elliott M Forney +
>- Endolith
>- Eric Wieser
>- Erik M. Bray
>- Eugene +
>- Evan Limanto +
>- Felix Berkenkamp +
>- François Bissey +
>- Frederic Bastien
>- Greg Young
>- Gregory R. Lee
>- Importance of Being Ernest +
>- Jaime Fernandez
>- Jakub Wilk +
>- James Cowgill +
>- James Sanders
>- Jean Utke +
>- Jesse Thoren +
>- Jim Crist +
>- Joerg Behrmann +
>- John Kirkham
>- Jonathan Helmus
>- Jonathan L Long
>- Jonathan Tammo Siebert +
>- Joseph Fox-Rabinovitz
>- Joshua Loyal +
>- Juan Nunez-Iglesias +
>- Julian Taylor
>- Kirill Balunov +
>- Likhith Chitneni +
>- Loïc Estève
>- Mads Ohm Larsen
>- Marein Könings +
>- Marten van Kerkwijk
>- Martin Thoma
>- Martino Sorbaro +
>- Marvin Schmidt +
>- Matthew Brett
>- Matthias Bussonnier +
>- Matthias C. M. Troffaes +
>- Matti Picus
>- Michael Seifert
>- Mikhail Pak +
>- Mortada Mehyar
>- Nathaniel J. Smith
>- Nick Papior
>- Oscar Villellas +
>- Pauli Virtanen
>- Pavel Potocek
>- Pete Peeradej Tanruangporn +
>- Philipp A +
>- Ralf Gommers
>- Robert Kern
>- Roland Kaufmann +
>- Ronan Lamy
>- Sami Salonen +
>- Sanchez Gonzalez Alvaro
>- Sebastian Berg
>- Shota Kawabuchi
>- Simon Gibbons
>- Stefan Otte
>- Stefan Peterson +
>- Stephan Hoyer
>- Søren Fuglede Jørgensen +
>- Takuya Akiba
>- Tom Boyd +
>- Ville Skyttä +
>- Warren Weckesser
>- Wendell Smith
>- Yu Feng
>- Zixu Zhao +
>- Zè Vinícius +
>- aha66 +
>- davidjn +
>- drabach +
>- drlvk +
>- jsh9 +
>- solarjoe +
>- zengi +
>
> Cheers,
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Controlling NumPy __mul__ method or forcing it to use __rmul__ of the "other"

2017-06-19 Thread Nathan Goldbaum
I don't think there's any real standard here. Just doing a github search
reveals many different choices people have used:

https://github.com/search?l=Python&q=__array_priority__&type=Code&utf8=%E2%9C%93

On Mon, Jun 19, 2017 at 11:07 AM, Ilhan Polat  wrote:

> Thank you. I didn't know that it existed. Is there any place where I can
> get a feeling for a sane priority number compared to what's being done in
> production? Just to make sure I'm not stepping on any toes.
>
> On Mon, Jun 19, 2017 at 5:36 PM, Stephan Hoyer  wrote:
>
>> I answered your question on StackOverflow:
>> https://stackoverflow.com/questions/40694380/forcing-multipl
>> ication-to-use-rmul-instead-of-numpy-array-mul-or-byp/44634634#44634634
>>
>> In brief, you need to set __array_priority__ or __array_ufunc__ on your
>> object.
>>
>> On Mon, Jun 19, 2017 at 5:27 AM, Ilhan Polat 
>> wrote:
>>
>>> I will assume some simple linear systems knowledge but the question can
>>> be generalized to any operator that implements __mul__ and __rmul__
>>> methods.
>>>
>>> Motivation:
>>>
>>> I am trying to implement a gain matrix, say 3x3 identity matrix, for
>>> time being with a single input single output (SISO) system that I have
>>> implemented as a class modeling a Transfer or a state space representation.
>>>
>>> In the typical usecase, suppose you would like to create an n-many
>>> parallel connections with the same LTI system sitting at each branch.
>>> MATLAB implements this as an elementwise multiplication and returning a
>>> multi input multi output(MIMO) system.
>>>
>>> G = tf(1,[1,1]);
>>> eye(3)*G
>>>
>>> produces (manually compactified)
>>>
>>> ans =
>>>
>>>   From input 1 to output...
>>>[1  ]
>>>[  --,   0   , 0]
>>>[  s + 1]
>>>[ 1 ]
>>>[  0,   -- ,   0]
>>>[   s + 1   ]
>>>[  1]
>>>[  0,   0,  --  ]
>>>[s + 1  ]
>>>
>>> Notice that the result type is of LTI system but, in our context, not a
>>> NumPy array with "object" dtype.
>>>
>>> In order to achieve a similar behavior, I would like to let the __rmul__
>>> of G take care of the multiplication. In fact, when I do
>>> G.__rmul__(np.eye(3)) I can control what the behavior should be and I
>>> receive the exception/result I've put in. However the array never looks for
>>> this method and carries out the default array __mul__ behavior.
>>>
>>> The situation is similar if we go about it as left multiplication
>>> G*eye(3) has no problems since this uses directly the __mul__ of G.
>>> Therefore we get a different result depending on the direction of
>>> multiplication.
>>>
>>> Is there anything I can do about this without forcing users subclassing
>>> or just letting them know about this particular quirk in the documentation?
>>>
>>> What I have in mind is to force the users to create static LTI objects
>>> and then multiply and reject this possibility. But then I still need to
>>> stop NumPy returning "object" dtyped array to be able to let the user know
>>> about this.
>>>
>>>
>>> Relevant links just in case
>>>
>>> the library : https://github.com/ilayn/harold/
>>>
>>> the issue discussion (monologue actually) :
>>> https://github.com/ilayn/harold/issues/7
>>>
>>> The question I've asked on SO (but with a rather offtopic answer):
>>> https://stackoverflow.com/q/40694380/4950339
>>>
>>>
>>> ilhan
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Boolean binary '-' operator

2017-06-28 Thread Nathan Goldbaum
Just as a comment: It would be really nice if NumPy could slow down the
pace of deprecations, or at least make the warnings about deprecations more
visible. It seems like every release breaks some subset of our test suite
(we only had one or two cases of using the binary - operator on boolean
arrays so it wasn't a big deal this time). For projects that don't have
resources for ongoing maintenance this is a recipe for bitrot...

On Wed, Jun 28, 2017 at 9:48 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> My two ¢: keep things as they are. There is just two much code that
> uses the C definition of bools, 0=False, 1=True. Coupled with casting
> every outcome that is unequal to 0 as True, * as AND, + as OR, and -
> as XOR makes sense (and -True would indeed be True, but I'm quite
> happy to have that one removed...).
>
> I lost track a little, but isn't this way also consistent with python,
> the one difference being that numpy does an implicit cast to bool on
> the result?
>
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dropping support for Accelerate

2017-07-23 Thread Nathan Goldbaum
See
https://mail.scipy.org/pipermail/numpy-discussion/2012-August/063589.html
and replies in that thread.

Quote from an Apple engineer in that thread:

"For API outside of POSIX, including GCD and technologies like Accelerate,
we do not support usage on both sides of a fork(). For this reason among
others, use of fork() without exec is discouraged in general in processes
that use layers above POSIX."

On Sun, Jul 23, 2017 at 10:16 AM, Ilhan Polat  wrote:

> That's probably because I know nothing about the issue, is there any
> reference I can read about?
>
> But in general, please feel free populate new items in the wiki page.
>
> On Sun, Jul 23, 2017 at 11:15 AM, Nathaniel Smith  wrote:
>
>> I've been wishing we'd stop shipping Accelerate for years, because of
>> how it breaks multiprocessing – that doesn't seem to be on your list
>> yet.
>>
>> On Sat, Jul 22, 2017 at 3:50 AM, Ilhan Polat 
>> wrote:
>> > A few months ago, I had the innocent intention to wrap LDLt
>> decomposition
>> > routines of LAPACK into SciPy but then I am made aware that the minimum
>> > required version of LAPACK/BLAS was due to Accelerate framework. Since
>> then
>> > I've been following the core SciPy team and others' discussion on this
>> > issue.
>> >
>> > We have been exchanging opinions for quite a while now within various
>> SciPy
>> > issues and PRs about the ever-increasing Accelerate-related issues and
>> I've
>> > compiled a brief summary about the ongoing discussions to reduce the
>> > clutter.
>> >
>> > First, I would like to kindly invite everyone to contribute and sharpen
>> the
>> > cases presented here
>> >
>> > https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate
>> >
>> > The reason I specifically wanted to post this also in NumPy mailing
>> list is
>> > to probe for the situation from the NumPy-Accelerate perspective. Is
>> there
>> > any NumPy specific problem that would indirectly effect SciPy should the
>> > support for Accelerate is dropped?
>> >
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>>
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] power function distribution or power-law distribution?

2017-08-24 Thread Nathan Goldbaum
The latest version of numpy is 1.13.

In this case, as described in the docs, a power function distribution is
one with a probability desnity function of the form ax^(a-1) for x between
0 and 1.

On Thu, Aug 24, 2017 at 9:41 AM, Renato Fabbri 
wrote:

> Thanks for the reply.
>
> But the question remains:
> how are the terms "power function distribution"
> and "power-law distribution" related?
>
> The documentation link you sent have no information on this.
> (
> And seems the same as I get here
> In [6]: n.version.full_version
> Out[6]: '1.11.0'
> )
>
> On Thu, Aug 24, 2017 at 11:07 AM, Pauli Virtanen  wrote:
>
>> to, 2017-08-24 kello 10:53 -0300, Renato Fabbri kirjoitti:
>> > numpy.random.power.__doc__
>> >
>> > uses only the term "power function distribution".
>>
>> The documentation in the most recent Numpy version seems to be more
>> explicit, see the Notes section for the PDF:
>>
>> https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.power
>> .html
>> 
>>
>> > BTW.  how is this list related to numpy-discuss...@scipy.org?
>>
>> That's the old address of this list.
>> The current address is numpy-discussion@python.org and it should be
>> used instead.
>>
>> --
>> Pauli Virtanen
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
> Renato Fabbri
> GNU/Linux User #479299
> labmacambira.sourceforge.net
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github overview change

2017-10-18 Thread Nathan Goldbaum
This is a change in the UI that github introduced a couple weeks ago during
their annual conference.

See https://github.com/blog/2447-a-more-connected-universe

On Wed, Oct 18, 2017 at 11:49 AM Charles R Harris 
wrote:

> On Wed, Oct 18, 2017 at 7:23 AM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
>
>> Hi all,
>>
>> probably silly, but is anyone else annoyed at not seeing comments
>> anymore in the github overview/start page? I stopped getting everything
>> as mails and had a (bad) habit of glancing at them which would spot at
>> least bigger discussions going on, but now it only shows actual
>> commits, which honestly are less interesting to me.
>>
>> Probably just me, was just wondering if anyone knew a setting or so?
>>
>
> Don't know any settings. It's almost as annoying as not forwarding my own
> comments ...
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy grant update

2017-10-26 Thread Nathan Goldbaum
My understanding of this is that the dtype will only hold the unit
metadata. So that means units would propogate through calculations
automatically, but the dtype wouldn't be able to manipulate the array data
(in an in-place unit conversion for example).

In this world, astropy quantities and yt's YTArray would become containers
around an ndarray that would make use of the dtype metadata but also
implement all of the unit semantics that they already implement. Since they
would become container classes and would no longer be ndarray subclasses,
that avoids most of the pitfalls one encounters these days.

Please correct me if I'm wrong, Nathaniel.

-Nathan

On Thu, Oct 26, 2017 at 5:14 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Nathaniel,
>
> Thanks for the link. The plans sounds great! You'll not be surprised
> to hear I'm particularly interested in the units aspect (and, no, I
> don't mind at all if we can stop subclassing ndarray...). Is the idea
> that there will be a general way for allow a dtype to define how to
> convert an array to one with another dtype? (Just as one now
> implicitly is able to convert between, say, int and float.) And, if
> so, is the idea that one of those conversion possibilities might
> involve checking units? Or were you thinking of implementing units
> more directly? The former would seem most sensible, if only so you can
> initially focus on other things than deciding how to support, say, esu
> vs emu units, or whether or not to treat radians as equal to
> dimensionless (which they formally are, but it is not always handy to
> do so).
>
> Anyway, do keep us posted! All the best,
>
> Marten
>
> On Thu, Oct 26, 2017 at 3:40 PM, Nathaniel Smith  wrote:
> > On Wed, Oct 18, 2017 at 10:24 PM, Nathaniel Smith  wrote:
> >> I'll also be giving a lunch talk at BIDS tomorrow to let folks locally
> >> know about what's going on, which I think will be recorded – I'll send
> >> around a link after in case others are interested.
> >
> > Here's that link: https://www.youtube.com/watch?v=fowHwlpGb34
> >
> > -n
> >
> > --
> > Nathaniel J. Smith -- https://vorpus.org
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-10-27 Thread Nathan Goldbaum
I’m using it in yt. If we were able to drop support for all old numpy
versions, switching would allow me to delete hundreds of lines of code.
As-is since we need to simultaneously support old and new versions, it adds
some additional complexity. If you’re ok with only supporting numpy >=
1.13, array_ufunc will make you life a lot easier.

On Fri, Oct 27, 2017 at 6:55 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Just to second Stephan's comment: do try it! I've moved astropy's
> Quantity over to it, and am certainly counting on the basic interface
> staying put...  -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-01 Thread Nathan Goldbaum
I think the biggest issues could be resolved if __array_concatenate__ were
finished. Unfortunately I don't feel like I can take that on right now.

See Ryan May's talk at scipy about using an ndarray subclass for units and
the issues he's run into:

https://www.youtube.com/watch?v=qCo9bkT9sow

On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> From my experience with Quantity, routines that properly ducktype work
> well, those that feel the need to accept list and blatantly do
> `asarray` do not - even if in many cases they would have worked if
> they used `asanyarray`...  But there are lots of nice surprises, with,
> e.g., `np.fft.fftfreq` just working as one would hope.  Anyway, bottom
> line, I think you should let this stop you from trying only if you
> know something important does not work. -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Nathan Goldbaum
On Thu, Nov 2, 2017 at 2:37 PM, Stephan Hoyer  wrote:

> On Thu, Nov 2, 2017 at 9:45 AM  wrote:
>
>> similar, scipy.special has ufuncs
>> what units are those?
>>
>> Most code that I know (i.e. scipy.stats and statsmodels) does not use only
>> "normal mathematical operations with ufuncs"
>> I guess there are a lot of "abnormal" mathematical operations
>> where just simply propagating the units will not work.
>>
>
>> Aside: The problem is more general also for other datastructures.
>> E.g. statsmodels for most parts uses only numpy ndarrays inside the
>> algorithm and computations because that provides well defined
>> behavior. (e.g. pandas behaved too differently in many cases)
>> I don't have much idea yet about how to change the infrastructure to
>> allow the use of dask arrays, sparse matrices and similar and possibly
>> automatic differentiation.
>>
>
> This is the exact same reason why pandas and xarray do not support
> wrapping arbitrary ndarray subclasses or duck array types. The operations
> we use internally (on numpy.ndarray objects) may not be what you would
> expect externally, and may even be implementation details not considered
> part of the public API. For example, in xarray we use numpy.nanmean() or
> bottleneck.nanmean() instead of numpy.mean().
>
> For NumPy and xarray, I think we could (and should) define an interface to
> support subclasses and duck types for generic operations for core
> use-cases. My main concern with subclasses / duck-arrays is
> undefined/untested behavior, especially where we might silently give the
> wrong answer or trigger some undesired operation (e.g., loading a lazily
> computed into memory) rather than raising an informative error. Leaking
> implementation details is another concern: we have already had several
> cases in NumPy where a function only worked on a subclass if a particular
> method was called internally, and broke when that was changed.
>

Would this issue be ameliorated given Nathaniel's proposal to try to move
away from subclasses and towards storing data in dtypes? Or would that just
mean that xarray would need to ban dtypes it doesn't know about?


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Nathan Goldbaum
On Thu, Nov 2, 2017 at 5:21 PM, Stephan Hoyer  wrote:

> On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum 
> wrote:
>
>> Would this issue be ameliorated given Nathaniel's proposal to try to move
>> away from subclasses and towards storing data in dtypes? Or would that just
>> mean that xarray would need to ban dtypes it doesn't know about?
>>
>
> Yes, I think custom dtypes would definitely help. Custom dtypes have a
> well contained interface, so lots of operations (e.g., concatenate,
> reshaping, indexing) are guaranteed to work in a dtype independent way. If
> you try to do an unsupported operation for such a dtype (e.g.,
> np.datetime64), you will generally get a good error message about an
> invalid dtype.
>
> In contrast, you can overload a subclass with totally arbitrary semantics
> (e.g., np.matrix) and of course for duck-types as well.
>
> This makes a big difference for libraries like dask or xarray, which need
> a standard interface to guarantee they do the right thing. I'm pretty sure
> we can wrap a custom dtype ndarray with units, but there's no way we're
> going to support np.matrix without significant work. It's hard to know
> which is which without well defined interfaces.
>

Ah, but what if the dtype modifies the interface? That might sound evil,
but it's something that's been proposed. For example, if I wanted to
replace yt's YTArray in a backward compatibile way with a dtype and just
use plain ndarrays everywhere, the dtype would need to *at least* modify
ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and
several other things.

Of course if I don't care about backward compatibility I can just do all of
these operations on the dtype object itself. However, I suspect whatever
implementation of custom dtypes gets added to numpy, it will have the
property that it can act like an arbitrary ndarray subclass otherwise
libraries like yt, Pint, metpy, and astropy won't be able to switch to it.

-Nathan


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] round(numpy.float64(0.0)) is a numpy.float64

2018-03-22 Thread Nathan Goldbaum
numpy.float is an alias to the python float builtin.

https://github.com/numpy/numpy/issues/3998


On Thu, Mar 22, 2018 at 2:26 PM Olivier  wrote:

> Hello,
>
>
> Is it normal, expected and desired that :
>
>
>   round(numpy.float64(0.0)) is a numpy.float64
>
>
> while
>
>   round(numpy.float(0.0)) is an integer?
>
>
> I find it disturbing and misleading. What do you think? Has it already been
> discussed somewhere else?
>
>
> Best regards,
>
>
> Olivier
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introduction: NumPy developers at BIDS

2018-04-10 Thread Nathan Goldbaum
On Tue, Apr 10, 2018 at 9:59 AM, Stefan van der Walt 
wrote:

> Hi Eric,
>
> On Sun, 08 Apr 2018 08:02:19 -1000, Eric Firing wrote:
> > On 2018/04/07 9:19 PM, Stefan van der Walt wrote:
> > > We would love community input on identifying the best areas & issues to
> > > pay attention to,
> >
> > What is the best way to provide this, and how will the decisions be
> > made?
>
> These are good questions.  We are also new at this, so while we have
> some ideas on how things could work, we may have to refine the process
> along the way.
>
> We want to operate as openly as we can, so discussing ideas on the
> mailing list is a preferred first option.  But we're also open to
> inchoate ideas and recommendations (including on how we run things on
> our end) via email.  Unless instructed explicitly otherwise, those ideas
> will likely bubble up into posts here anyway.
>
> Since we're learning the ropes, we'd like to expose the team to a wide
> variety of ideas.  Visitors to the team are most welcome---please reach
> out to me if you want to talk to us, either in person or via video chat.
>
> Can you help us think of good ways to learn "community priorities"?
> E.g., for GitHub issues, should we take monthly polls, count the number
> of "thumbs up"s, consider issues with the most comments, or tally the
> number of explicit mentions of team members?
>

Keep in mind that only a subset of the community engages on GitHub (mostly
developers who are already engaged in the numpy community).

You may want to explore other venues for this sort of feedback, e.g. a
SciPy BoF session, which will capture a different subset of the community.


>
> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Short-circuiting equivalent of np.any or np.all?

2018-04-26 Thread Nathan Goldbaum
Hi all,

I was surprised recently to discover that both np.any and np.all() do not
have a way to exit early:

In [1]: import numpy as np

In [2]: data = np.arange(1e6)

In [3]: print(data[:10])
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

In [4]: %timeit np.any(data)
724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)

In [5]: data = np.zeros(int(1e6))

In [6]: %timeit np.any(data)
732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)

I don't see any discussions about this on the NumPy issue tracker but
perhaps I'm missing something.

I'm curious if there's a way to get a fast early-terminating search in
NumPy? Perhaps there's another package I can depend on that does this? I
guess I could also write a bit of cython code that does this but so far
this project is pure python and I don't want to deal with the packaging
headache of getting wheels built and conda-forge packages set up on all
platforms.

Thanks for your help!

-Nathan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all?

2018-04-26 Thread Nathan Goldbaum
On Thu, Apr 26, 2018 at 11:52 AM Hameer Abbasi 
wrote:

> Hi Nathan,
>
> np.any and np.all call np.or.reduce and np.and.reduce respectively, and
> unfortunately the underlying function (ufunc.reduce) has no way of
> detecting that the value isn’t going to change anymore. It’s also used for
> (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce),
> np.min(np.minimum.reduce), np.max(np.maximum.reduce).
>
> You can find more information about this on the ufunc doc page
> <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>. I don’t think
> it’s worth it to break this machinery for any and all, as it has numerous
> other advantages (such as being able to override in duck arrays, etc)
>

Sure, I'm not saying that numpy should change, more trying to see if
there's an alternate way to get what I want in NumPy or some other package.


>
> Best regards,
> Hameer Abbasi
> Sent from Astro <https://www.helloastro.com> for Mac
>
> On Apr 26, 2018 at 18:45, Nathan Goldbaum  wrote:
>
>
> Hi all,
>
> I was surprised recently to discover that both np.any and np.all() do not
> have a way to exit early:
>
> In [1]: import numpy as np
>
> In [2]: data = np.arange(1e6)
>
> In [3]: print(data[:10])
> [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
>
> In [4]: %timeit np.any(data)
> 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
>
> In [5]: data = np.zeros(int(1e6))
>
> In [6]: %timeit np.any(data)
> 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
>
> I don't see any discussions about this on the NumPy issue tracker but
> perhaps I'm missing something.
>
> I'm curious if there's a way to get a fast early-terminating search in
> NumPy? Perhaps there's another package I can depend on that does this? I
> guess I could also write a bit of cython code that does this but so far
> this project is pure python and I don't want to deal with the packaging
> headache of getting wheels built and conda-forge packages set up on all
> platforms.
>
> Thanks for your help!
>
> -Nathan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all?

2018-04-26 Thread Nathan Goldbaum
On Thu, Apr 26, 2018 at 12:03 PM Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Would it be useful to have a short-circuited version of the function that
> is not a ufunc?
>

Yes definitely. I could use numba as suggested by Hameer but I'd rather not
add a new runtime dependency. I could use cython or C but I'd need to deal
with the packaging headaches of including C code in your package.

I guess I could also create a new project that just implements the
functions I need in cython, deal with the packaging headaches there, and
then depend on that package. At least that way others won't need to deal
with the pain :)


> - Joe
>
> On Thu, Apr 26, 2018 at 12:51 PM, Hameer Abbasi  > wrote:
>
>> Hi Nathan,
>>
>> np.any and np.all call np.or.reduce and np.and.reduce respectively, and
>> unfortunately the underlying function (ufunc.reduce) has no way of
>> detecting that the value isn’t going to change anymore. It’s also used for
>> (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce),
>> np.min(np.minimum.reduce), np.max(np.maximum.reduce).
>>
>> You can find more information about this on the ufunc doc page
>> <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>. I don’t think
>> it’s worth it to break this machinery for any and all, as it has numerous
>> other advantages (such as being able to override in duck arrays, etc)
>>
>> Best regards,
>> Hameer Abbasi
>> Sent from Astro <https://www.helloastro.com> for Mac
>>
>> On Apr 26, 2018 at 18:45, Nathan Goldbaum  wrote:
>>
>>
>> Hi all,
>>
>> I was surprised recently to discover that both np.any and np.all() do not
>> have a way to exit early:
>>
>> In [1]: import numpy as np
>>
>> In [2]: data = np.arange(1e6)
>>
>> In [3]: print(data[:10])
>> [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
>>
>> In [4]: %timeit np.any(data)
>> 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
>>
>> In [5]: data = np.zeros(int(1e6))
>>
>> In [6]: %timeit np.any(data)
>> 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
>>
>> I don't see any discussions about this on the NumPy issue tracker but
>> perhaps I'm missing something.
>>
>> I'm curious if there's a way to get a fast early-terminating search in
>> NumPy? Perhaps there's another package I can depend on that does this? I
>> guess I could also write a bit of cython code that does this but so far
>> this project is pure python and I don't want to deal with the packaging
>> headache of getting wheels built and conda-forge packages set up on all
>> platforms.
>>
>> Thanks for your help!
>>
>> -Nathan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Casting scalars

2018-05-10 Thread Nathan Goldbaum
In [1]: import numpy as np

In [2]: np.float64(12)
Out[2]: 12.0

In [3]: np.float64(12).dtype
Out[3]: dtype('float64')

On Thu, May 10, 2018 at 9:49 PM Hameer Abbasi 
wrote:

> Hello, everyone!
>
> I might be missing something and this might be a very stupid and redundant
> question, but is there a way to cast a scalar to a given dtype?
>
> Hameer
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Casting scalars

2018-05-10 Thread Nathan Goldbaum
On Thu, May 10, 2018 at 9:51 PM Stuart Reynolds 
wrote:

> np.float(scalar)
>

This actually isn't right. It's a common misconception, but np.float is an
alias to the built-in float type. You probably want np.float_(scalar)

In [5]: np.float_(12).dtype
Out[5]: dtype('float64')

In [6]: np.float is float
Out[6]: True


>
> On Thu, May 10, 2018 at 7:49 PM Hameer Abbasi 
> wrote:
>
>> Hello, everyone!
>>
>> I might be missing something and this might be a very stupid and
>> redundant question, but is there a way to cast a scalar to a given dtype?
>>
>> Hameer
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Turn numpy.ones_like into a ufunc

2018-05-18 Thread Nathan Goldbaum
I don't particularly need this, although it would be nice to make this
behavior explicit, instead of happening more or less by accident:

In [1]: from yt.units import km

In [2]: import numpy as np

In [3]: data = [1, 2, 3]*km

In [4]: np.ones_like(data)
Out[4]: YTArray([1., 1., 1.]) km


On Fri, May 18, 2018 at 9:51 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> I'm greatly in favour, especially if the same can be done for
> `zeros_like` and `empty_like`, but note that a tricky part is that
> ufuncs do not deal very graciously with structured (void) and string
> dtypes. -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Citation for ndarray

2018-05-24 Thread Nathan Goldbaum
Hi all,

I see listed on the scipy.org site that the preferred citation for NumPy is
the "Guide to NumPy":

https://www.scipy.org/citing.html

This could work for what I'm writing, but I'd prefer to find a citation
specifically for NumPy's ndarray data structure. Does such a citation exist?

Thanks!

-Nathan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

2018-06-02 Thread Nathan Goldbaum
Perhaps I missed this but I didn’t see: what happens when both
__array_ufunc__ and __array_function__ are defined? I might want to do this
to for example add support for functions like concatenate or stack to a
class that already has an __array_ufunc__ defines.

On Sat, Jun 2, 2018 at 5:56 PM Stephan Hoyer  wrote:

> Matthew Rocklin and I have written NEP-18, which proposes a new dispatch
> mechanism for NumPy's high level API:
> http://www.numpy.org/neps/nep-0018-array-function-protocol.html
>
> There has already been a little bit of scattered discussion on the pull
> request (https://github.com/numpy/numpy/pull/11189), but per NEP-0 let's
> try to keep high-level discussion here on the mailing list.
>
> The full text of the NEP is reproduced below:
>
> ==
> NEP: Dispatch Mechanism for NumPy's high level API
> ==
>
> :Author: Stephan Hoyer 
> :Author: Matthew Rocklin 
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-29
>
> Abstact
> ---
>
> We propose a protocol to allow arguments of numpy functions to define
> how that function operates on them. This allows other libraries that
> implement NumPy's high level API to reuse Numpy functions. This allows
> libraries that extend NumPy's high level API to apply to more NumPy-like
> libraries.
>
> Detailed description
> 
>
> Numpy's high level ndarray API has been implemented several times
> outside of NumPy itself for different architectures, such as for GPU
> arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
> arrays (Dask array) as well as various Numpy-like implementations in the
> deep learning frameworks, like TensorFlow and PyTorch.
>
> Similarly there are several projects that build on top of the Numpy API
> for labeled and indexed arrays (XArray), automatic differentation
> (Autograd, Tangent), higher order array factorizations (TensorLy), etc.
> that add additional functionality on top of the Numpy API.
>
> We would like to be able to use these libraries together, for example we
> would like to be able to place a CuPy array within XArray, or perform
> automatic differentiation on Dask array code. This would be easier to
> accomplish if code written for NumPy ndarrays could also be used by
> other NumPy-like projects.
>
> For example, we would like for the following code example to work
> equally well with any Numpy-like array object:
>
> .. code:: python
>
> def f(x):
> y = np.tensordot(x, x.T)
> return np.mean(np.exp(y))
>
> Some of this is possible today with various protocol mechanisms within
> Numpy.
>
> -  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
> -  The ``.T`` method works using Python's method dispatch
> -  The ``np.mean`` function explicitly checks for a ``.mean`` method on
>the argument
>
> However other functions, like ``np.tensordot`` do not dispatch, and
> instead are likely to coerce to a Numpy array (using the ``__array__``)
> protocol, or err outright. To achieve enough coverage of the NumPy API
> to support downstream projects like XArray and autograd we want to
> support *almost all* functions within Numpy, which calls for a more
> reaching protocol than just ``__array_ufunc__``. We would like a
> protocol that allows arguments of a NumPy function to take control and
> divert execution to another function (for example a GPU or parallel
> implementation) in a way that is safe and consistent across projects.
>
> Implementation
> --
>
> We propose adding support for a new protocol in NumPy,
> ``__array_function__``.
>
> This protocol is intended to be a catch-all for NumPy functionality that
> is not covered by existing protocols, like reductions (like ``np.sum``)
> or universal functions (like ``np.exp``). The semantics are very similar
> to ``__array_ufunc__``, except the operation is specified by an
> arbitrary callable object rather than a ufunc instance and method.
>
> The interface
> ~
>
> We propose the following signature for implementations of
> ``__array_function__``:
>
> .. code-block:: python
>
> def __array_function__(self, func, types, args, kwargs)
>
> -  ``func`` is an arbitrary callable exposed by NumPy's public API,
>which was called in the form ``func(*args, **kwargs)``.
> -  ``types`` is a list of types for all arguments to the original NumPy
>function call that will be checked for an ``__array_function__``
>implementation.
> -  The tuple ``args`` and dict ``**kwargs`` are directly passed on from the
>original call.
>
> Unlike ``__array_ufunc__``, there are no high-level guarantees about the
> type of ``func``, or about which of ``args`` and ``kwargs`` may contain
> objects
> implementing the array API. As a convenience for ``__array_function__``
> implementors of the NumPy API, the ``types`` keyword contains a list of all
> types that implement the ``__array_function__`` prot

Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

2018-06-05 Thread Nathan Goldbaum
Hmm, does this mean the callable that gets passed into __array_ufunc__ will
change? I'm pretty sure that will break the dispatch mechanism I'm using in
my __array_ufunc__ implementation, which directly checks whether the
callable is in one of several tuples of functions that have different
behavior.

On Tue, Jun 5, 2018 at 7:32 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Yes, the function should definitely be the same as what the user called -
> i.e., the decorated function. I'm only wondering if it would also be
> possible to have access to the undecorated one (via `coerce` or
> `ndarray.__array_function__` or otherwise).
> -- Marten
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

2018-06-05 Thread Nathan Goldbaum
Oh wait, since the decorated version of the ufunc will be the one in the
public numpy API it won't break. It would only break if the callable that
was passed in *wasn't* the decorated version, so it kinda *has* to pass in
the decorated function to preserve backward compatibility. Apologies for
the noise.


On Tue, Jun 5, 2018 at 7:39 PM, Nathan Goldbaum 
wrote:

> Hmm, does this mean the callable that gets passed into __array_ufunc__
> will change? I'm pretty sure that will break the dispatch mechanism I'm
> using in my __array_ufunc__ implementation, which directly checks whether
> the callable is in one of several tuples of functions that have different
> behavior.
>
> On Tue, Jun 5, 2018 at 7:32 PM, Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Yes, the function should definitely be the same as what the user called -
>> i.e., the decorated function. I'm only wondering if it would also be
>> possible to have access to the undecorated one (via `coerce` or
>> `ndarray.__array_function__` or otherwise).
>> -- Marten
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Updated 1.15.0 release notes

2018-06-13 Thread Nathan Goldbaum
Hi Chuck,

Are you planning on doing an rc release this time? I think the NumPy 1.14
release was unusually bumpy and part of that was the lack of an rc. One
example: importing h5py caused a warning under numpy 1.14 and an h5py
release didn’t come out with a workaround or fix for a couple months. There
was also an issue with array printing that caused problems in yt (although
both yt and NumPy quickly did bugfix releases that fixed that).

I guess 1.14 was particularly noisy, but still I’d really appreciate having
a prerelease version to test against and some time to report issues with
the prerelease so numpy and other projects can implement workarounds as
needed without doing a release that might potentially break real users who
happen to install right after numpy 1.x.0 comes out.

Best,
Nathan Goldbaum

On Wed, Jun 13, 2018 at 7:11 PM Charles R Harris 
wrote:

> Hi All,
>
> There is a PR for the updated NumPy 1.15.0 release notes
> <https://github.com/numpy/numpy/pull/11327> . I would appreciate it if
> all those involved in the thatn release would have a look and fix incorrect
> or missing notes.
>
> Cheers,
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Updated 1.15.0 release notes

2018-06-13 Thread Nathan Goldbaum
OK I guess I missed that announcement.

I wouldn’t mind more than one email with a reminder to test.

On Wed, Jun 13, 2018 at 7:42 PM Charles R Harris 
wrote:

> On Wed, Jun 13, 2018 at 6:28 PM, Nathan Goldbaum 
> wrote:
>
>> Hi Chuck,
>>
>> Are you planning on doing an rc release this time? I think the NumPy 1.14
>> release was unusually bumpy and part of that was the lack of an rc. One
>> example: importing h5py caused a warning under numpy 1.14 and an h5py
>> release didn’t come out with a workaround or fix for a couple months. There
>> was also an issue with array printing that caused problems in yt (although
>> both yt and NumPy quickly did bugfix releases that fixed that).
>>
>> I guess 1.14 was particularly noisy, but still I’d really appreciate
>> having a prerelease version to test against and some time to report issues
>> with the prerelease so numpy and other projects can implement workarounds
>> as needed without doing a release that might potentially break real users
>> who happen to install right after numpy 1.x.0 comes out.
>>
>
> There was a 1.14.0rc1
> <https://github.com/numpy/numpy/releases/tag/v1.14.0rc1>. I was too quick
> for the full release, just waited three weeks, so maybe four this time. Too
> few people actually test the candidates and give feedback, so I tend to
> regard the *.*.0 releases as the true rc :)
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] rackspace ssl certificates

2018-06-18 Thread Nathan Goldbaum
I think Matthew Brett needs to fix this.

On Mon, Jun 18, 2018 at 3:20 PM Charles R Harris 
wrote:

> Hi All,
>
> I've been trying to put out the NumPy 1.15.0rc1, but cannot get
> `numpy-wheels` to upload the wheels to rackspace on windows, there is a
> certification problem. I note that that requirement was supposedly disabled:
>
>  on_success:
>   # Upload the generated wheel package to Rackspace
>   # On Windows, Apache Libcloud cannot find a standard CA cert bundle so we
>   # disable the ssl checks.
>
> and nothing relevant seems to have changed in our `.appveyor.yml` since
> the last successful run 7 days ago, 6 if we count 1.14.5, so I'm thinking a
> policy has changed at either at rackspace or appveyor, but that is just a
> guess. I'm experimenting with various changes to the script and the
> `apache-libcloud` version to see if I can get success, but thought I'd ask
> if anyone knew anything that might be helpful.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP 574 - zero-copy pickling with out of band data

2018-07-02 Thread Nathan Goldbaum
On Mon, Jul 2, 2018 at 7:42 PM Andrew Nelson  wrote:

> 
>
> On Tue, 3 Jul 2018 at 09:31, Charles R Harris 
> wrote:
>
>>
>> ISTR that some parallel processing applications sent pickled arrays
>> around to different processes, I don't know if that is still the case, but
>> if so, no copy might be a big gain for them.
>>
>
> That is very much correct. One example is using MCMC, which is massively
> parallel. I do parallelisation with mpi4py, and this requires distribution
> of pickled data of a reasonable size to the entire MPI world. This pickling
> introduces quite a bit of overhead.
>

Doesn’t mpi4py have support for buffered low-level communication of numpy
arrays? See e.g.
https://mpi4py.scipy.org/docs/usrman/tutorial.html

Although I guess with Antoine’s proposal uses of the “lowercase” mpi4py API
where data might get pickled will see speedups.

___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.15.0rc2 released.

2018-07-09 Thread Nathan Goldbaum
Hi Chuck,

Is there a summary of the differences with respect to rc1 somewhere?

Nathan

On Mon, Jul 9, 2018 at 5:08 PM Charles R Harris 
wrote:

> Hi All,
>
> On behalf of the NumPy team I'm pleased to announce the release of NumPy
> 1.15.0rc2.
> This release has an unusual number of cleanups, many deprecations of old
> functions,
> and improvements to many existing functions. A total of 435 pull reguests
> were merged
> for this release, please look at the release notes
> <https://github.com/numpy/numpy/releases/tag/v1.15.0rc2>for details. Some
> highlights are:
>
>- NumPy has switched to pytest for testing.
>- A new  `numpy.printoptions` context manager.
>- Many improvements to the histogram functions.
>- Support for unicode field names in python 2.7.
>- Improved support for PyPy.
>- Fixes and improvements to `numpy.einsum`.
>
> The Python versions supported by this release are 2.7, 3.4-3.7.  The
> wheels are linked with
> OpenBLAS v0.3.0, which should fix some of the linalg problems reported for
> NumPy 1.14.
>
> Wheels for this release can be downloaded from PyPI
> <https://pypi.org/project/numpy/1.15.0rc2/>, source archives are available
> from Github <https://github.com/numpy/numpy/releases/tag/v1.15.0rc2>.
>
> A total of 131 people contributed to this release.  People with a "+" by
> their
> names contributed a patch for the first time.
>
>
>- Aaron Critchley +
>- Aarthi +
>- Aarthi Agurusa +
>- Alex Thomas +
>- Alexander Belopolsky
>- Allan Haldane
>- Anas Khan +
>- Andras Deak
>- Andrey Portnoy +
>- Anna Chiara
>- Aurelien Jarno +
>- Baurzhan Muftakhidinov
>- Berend Kapelle +
>- Bernhard M. Wiedemann
>- Bjoern Thiel +
>- Bob Eldering
>- Cenny Wenner +
>- Charles Harris
>- ChloeColeongco +
>- Chris Billington +
>- Christopher +
>- Chun-Wei Yuan +
>- Claudio Freire +
>- Daniel Smith
>- Darcy Meyer +
>- David Abdurachmanov +
>- David Freese
>- Deepak Kumar Gouda +
>- Dennis Weyland +
>- Derrick Williams +
>- Dmitriy Shalyga +
>- Eric Cousineau +
>- Eric Larson
>- Eric Wieser
>- Evgeni Burovski
>- Frederick Lefebvre +
>- Gaspar Karm +
>- Geoffrey Irving
>- Gerhard Hobler +
>- Gerrit Holl
>- Guo Ci +
>- Hameer Abbasi +
>- Han Shen
>- Hiroyuki V. Yamazaki +
>- Hong Xu
>- Ihor Melnyk +
>- Jaime Fernandez
>- Jake VanderPlas +
>- James Tocknell +
>- Jarrod Millman
>- Jeff VanOss +
>- John Kirkham
>- Jonas Rauber +
>- Jonathan March +
>- Joseph Fox-Rabinovitz
>- Julian Taylor
>- Junjie Bai +
>- Juris Bogusevs +
>- Jörg Döpfert
>- Kenichi Maehashi +
>- Kevin Sheppard
>- Kimikazu Kato +
>- Kirit Thadaka +
>- Kritika Jalan +
>- Lakshay Garg +
>- Lars G +
>- Licht Takeuchi
>- Louis Potok +
>- Luke Zoltan Kelley
>- MSeifert04 +
>- Mads R. B. Kristensen +
>- Malcolm Smith +
>- Mark Harfouche +
>- Marten H. van Kerkwijk +
>- Marten van Kerkwijk
>- Matheus Vieira Portela +
>- Mathieu Lamarre
>- Mathieu Sornay +
>- Matthew Brett
>- Matthew Rocklin +
>- Matthias Bussonnier
>- Matti Picus
>- Michael Droettboom
>- Miguel Sánchez de León Peque +
>- Mike Toews +
>- Milo +
>- Nathaniel J. Smith
>- Nelle Varoquaux
>- Nicholas Nadeau, P.Eng., AVS +
>- Nick Minkyu Lee +
>- Nikita +
>- Nikita Kartashov +
>- Nils Becker +
>- Oleg Zabluda
>- Orestis Floros +
>- Pat Gunn +
>- Paul van Mulbregt +
>- Pauli Virtanen
>- Pierre Chanial +
>- Ralf Gommers
>- Raunak Shah +
>- Robert Kern
>- Russell Keith-Magee +
>- Ryan Soklaski +
>- Samuel Jackson +
>- Sebastian Berg
>- Siavash Eliasi +
>- Simon Conseil
>- Simon Gibbons
>- Stefan Krah +
>- Stefan van der Walt
>- Stephan Hoyer
>- Subhendu +
>- Subhendu Ranjan Mishra +
>- Tai-Lin Wu +
>- Tobias Fischer +
>- Toshiki Kataoka +
>- Tyler Reddy +
>- Unknown +
>- Varun Nayyar
>- Victor Rodriguez +
>- Warren Weckesser
>- William D. Irons +
>- Zane Bradley +
>- fo40225 +
>- lapack_lite code generator +
>- lumbric +
>- luzpaz +
>- mamrehn +
>- tynn +
>- xoviat
>
> Cheers
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-01 Thread Nathan Goldbaum
I realize this was probably brought up in the discussions about the scipy
code of conduct which I have not looked at, but I’m troubled by the
inclusion of “political beliefs” in the document.

See e.g.
https://github.com/jupyter/governance/pull/5

As a thought experiment, what if someone’s political beliefs imply that
other contributors are not deserving of human rights? Increasingly ideas
like this are coming into the mainstream worldwide and I think this is a
real concern that should be considered.

On Mon, Jul 30, 2018 at 8:25 PM Charles R Harris 
wrote:

> On Fri, Jul 27, 2018 at 4:02 PM, Stefan van der Walt  > wrote:
>
>> Hi everyone,
>>
>> A while ago, SciPy (the library) adopted its Code of Conduct:
>>
>> https://docs.scipy.org/doc/scipy/reference/dev/conduct/code_of_conduct.html
>>
>> We worked hard to make that document friendly, while at the same time
>> stating clearly the kinds of behavior that would and would not be
>> tolerated.
>>
>> I propose that we adopt the SciPy code of conduct for NumPy as well.  It
>> is a good way to signal to newcomers that this is a community that cares
>> about how people are treated.  And I think we should do anything in our
>> power to make NumPy as attractive as possible!
>>
>> If we adopt this document as policy, we will need to select a Code of
>> Conduct committee, to whom potential transgressions can be reported.
>> The individuals doing this for SciPy may very well be happy to do the
>> same for NumPy, but the community should decide whom will best serve
>> those roles.
>>
>> Let me know your thoughts.
>>
>
> +1 from me.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-01 Thread Nathan Goldbaum
On Wed, Aug 1, 2018 at 9:49 AM, Ralf Gommers  wrote:

>
>
> On Wed, Aug 1, 2018 at 12:20 AM, Nathan Goldbaum 
> wrote:
>
>> I realize this was probably brought up in the discussions about the scipy
>> code of conduct which I have not looked at, but I’m troubled by the
>> inclusion of “political beliefs” in the document.
>>
>
> It was not brought up explicitly as far as I remember.
>
>
>> See e.g.
>> https://github.com/jupyter/governance/pull/5
>>
>
> That's about moving names around. I don't see any mention of political
> beliefs?
>

Sorry about that, I elided the 6. This is the correct link:

https://github.com/jupyter/governance/pull/56


>
>
>> As a thought experiment, what if someone’s political beliefs imply that
>> other contributors are not deserving of human rights? Increasingly ideas
>> like this are coming into the mainstream worldwide and I think this is a
>> real concern that should be considered.
>>
>
> There is a difference between having beliefs, and expressing those beliefs
> in ways that offends others. I don't see any problem with saying that we
> welcome anyone, irrespective of political belief. However, if someone
> starts expressing things that are intolerant (like someone else not
> deserving human rights) on any of our communication forums or in an
> in-person meeting, that would be a clear violation of the CoC. Which can be
> dealt with via the reporting and enforcement mechanism in the CoC.
>
> I don't see a problem here, but I would see a real problem with removing
> the "political beliefs" phrase.
>

For another perspective on this issue see
https://where.coraline.codes/blog/oscon/, where Coraline Ada describes her
reasons for not speaking at OSCON this year due to a similar clause in the
code of conduct.


Cheers,
> Ralf
>
>
>
>>
>> On Mon, Jul 30, 2018 at 8:25 PM Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>> On Fri, Jul 27, 2018 at 4:02 PM, Stefan van der Walt <
>>> stef...@berkeley.edu> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> A while ago, SciPy (the library) adopted its Code of Conduct:
>>>> https://docs.scipy.org/doc/scipy/reference/dev/conduct/code_
>>>> of_conduct.html
>>>>
>>>> We worked hard to make that document friendly, while at the same time
>>>> stating clearly the kinds of behavior that would and would not be
>>>> tolerated.
>>>>
>>>> I propose that we adopt the SciPy code of conduct for NumPy as well.  It
>>>> is a good way to signal to newcomers that this is a community that cares
>>>> about how people are treated.  And I think we should do anything in our
>>>> power to make NumPy as attractive as possible!
>>>>
>>>> If we adopt this document as policy, we will need to select a Code of
>>>> Conduct committee, to whom potential transgressions can be reported.
>>>> The individuals doing this for SciPy may very well be happy to do the
>>>> same for NumPy, but the community should decide whom will best serve
>>>> those roles.
>>>>
>>>> Let me know your thoughts.
>>>>
>>>
>>> +1 from me.
>>>
>>> Chuck
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Taking back control of the #numpy irc channel

2018-08-06 Thread Nathan Goldbaum
Hi,

I idle in #scipy and have op in there. I’m happy start idling in #numpy and
be op if the community is willing to let me. I’m also in the process of
getting ops for #matplotlib for similar spam-related reasons. I’d say all
the scientific python IRC channels I’m in get a decent amount of traffic
(perhaps 10% of the number of questions that get asked on StackOverflow)
and it’s a good venue for asking quick questions. Let’s hope that forcing
people to register doesn’t kill that, although there’s not much we can do
given the spam attack.

Nathan

On Mon, Aug 6, 2018 at 9:03 PM Matti Picus  wrote:

> Over the past few days spambots have been hitting freenode's IRC
> channels[0, 1]. It turns out the #numpy channel has no operator, so we
> cannot make the channel mode "|+q $~a"[2] - i.e. only registered
> freenode users can talk but anyone can listen.
>
> I was in touch with the freenode staff, they requested that someone from
> the steering council reach out to them at ||proje...@freenode.net, here
> is the quote from the discussion:
>
> "
> it's pretty much a matter of them sending an email telling us who they'd
> like to represent them on freenode, which channels and cloak namespaces
> they want, and any info we might need on the project
> "
>
> In the mean time they set the channel mode appropriately, so this is
> also a notice that if you want to chat on the #numpy IRC channel you
> need to register.
>
> Hope someone from the council picks this up and reaches out to them, and
> will decide who is to able to become channel operators (the recommended
> practice is to use it like sudo, only assume the role when needed then
> turn it back off).
>
> Matti
>
> [0] https://freenode.net/news/spambot-attack
> [1] https://freenode.net/news/spam-shake
> [2] https://nedbatchelder.com/blog/201808/fighting_spam_on_freenode.html
> |
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] 2019 Scipy Conference: Call for Proposals

2019-01-08 Thread Nathan Goldbaum
SciPy 2019, the 18th annual Scientific Computing with Python conference,
will be held July 8-14, 2019 in Austin, Texas. The annual SciPy Conference
brings together over 800 participants from industry, academia, and
government to showcase their latest projects, learn from skilled users and
developers, and collaborate on code development. The call for abstracts for
SciPy 2019 for talks, posters and tutorials is now open. The deadline for
submissions is February 10, 2019.

Conference Website: https://www.scipy2019.scipy.org/

Submission Website: https://easychair.org/conferences/?conf=scipy2019

Talks and Posters (July 10-12, 2019)

In addition to the general track, this year will have specialized tracks
focused on:


   -

   Data Driven Discoveries (including Machine Learning and Data Science)
   -

   Open Source Communities (Sustainability)


Mini Symposia

   -

   Science Communication through Visualization
   -

   Neuroscience and Cognitive Science
   -

   Image Processing
   -

   Earth, Ocean, Geo and Atmospheric Science



There will also be a SciPy Tools Plenary Session each day with 2 to 5
minute updates on tools and libraries.

Tutorials (July 8-9, 2019)

Tutorials should be focused on covering a well-defined topic in a hands-on
manner. We are looking for useful techniques or packages, helping new or
advanced Python programmers develop better or faster scientific
applications. We encourage submissions to be designed to allow at least 50%
of the time for hands-on exercises even if this means the subject matter
needs to be limited. Tutorials will be 4 hours in duration. In your
tutorial application, you can indicate what prerequisite skills and
knowledge will be needed for your tutorial, and the approximate expected
level of knowledge of your students (i.e., beginner, intermediate,
advanced). Instructors of accepted tutorials will receive a stipend.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Converting np.sinc into a ufunc

2019-05-22 Thread Nathan Goldbaum
It might be worth using BigQuery to search the github repository public
dataset for usages of np.sinc with keyword arguments.

On Wed, May 22, 2019 at 1:05 PM Sebastian Berg 
wrote:

> Hi all,
>
> there is an open PR (https://github.com/numpy/numpy/pull/12924) to
> convert `np.sinc` into a ufunc. Since it should improve general
> precision in `np.sinc`, I thought we could try to move that forward a
> bit. We check whether this is worth it or not in the end.
>
> However, it would also change behaviour slightly since `np.sinc(x=arr)`
> will not work, as ufuncs are positional arguments only (we could wrap
> `sinc`, but that hides all the nice features). Otherwise, there should
> be no change except additional features of ufuncs and the move to a C
> implementation.
>
> This is mostly to see if anyone is worried about possible slight API
> change here.
>
> All the Best,
>
> Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Re: Endorsing SPECs 1, 6, 7, and 8

2024-10-08 Thread Nathan via NumPy-Discussion
Thanks for clarifying! In that case I think endorsing SPEC 7 makes sense.

On Tue, Oct 8, 2024 at 3:08 PM Robert Kern  wrote:

> On Tue, Oct 8, 2024 at 8:36 AM Nathan via NumPy-Discussion <
> numpy-discussion@python.org> wrote:
>
>>
>> Since the legacy RNG interface cannot be deprecated and we encourage
>> downstream to use it in tests according to the text of NEP 19, I'm not sure
>> about the text in SPEC 7 that talks about deprecating using legacy RNGs. Or
>> are you saying that we have now reached the point where we can update NEP
>> 19 to encourage moving away from the legacy interface?
>>
>
>  We have already always encouraged people to move away from the legacy
> interface in their APIs. SPEC 7 recommends a principled way for downstream
> projects to implement that move.
>
> NEP 19 acknowledged that sometimes one might still have a use case for
> creating a legacy RandomState object and calling it in their tests to
> generate test data (but not otherwise pass that RandomState object to the
> code under test), but that's not what SPEC 7 addresses. NEP 19 doesn't
> really actively recommend the use of RandomState for this purpose, just
> acknowledges that it's a valid use case that numpy will continue to support
> even while we push for the exclusive use of Generator inside of
> library/program code. NEP 19 doesn't need an update for us to endorse SPEC
> 7 (whether it needs one, separately, to clarify its intent is another
> question).
>
> --
> Robert Kern
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Endorsing SPECs 1, 6, 7, and 8

2024-10-08 Thread Nathan via NumPy-Discussion
Regarding thread safety - that's not a problem. At least for Python 3.13,
the GIL is temporarily re-enabled during imports. That won't necessarily be
true in the future, but separately CPython also uses per-module locks on
import, so there shouldn't be any issues with threads simultaneously
importing submodules.

It looks like we already implement lazy-loading for e.g. linalg, fft,
random, and other submodules. Does that lazy-loading mechanism conform to
the SPEC? If not, should it?

The keys to the castle SPEC makes sense to me, I'm fine with endorsing it.
I believe that all of NumPy's online accounts are already spread out over
multiple maintainers, so presumably we don't actually need to do much here
to implement it?

Since the legacy RNG interface cannot be deprecated and we encourage
downstream to use it in tests according to the text of NEP 19, I'm not sure
about the text in SPEC 7 that talks about deprecating using legacy RNGs. Or
are you saying that we have now reached the point where we can update NEP
19 to encourage moving away from the legacy interface? From the text of NEP
19 regarding the legacy RNG interface:

> This NEP does not propose that these requirements remain in perpetuity.
After we have experience with the new PRNG subsystem, we can and should
revisit these issues in future NEPs.

I don't have a problem with SPEC 8, although I suspect there might be a
fair bit of work to get NumPy's CI to match the suggestions in the SPEC.



On Tue, Oct 8, 2024 at 2:08 PM Joren Hammudoglu via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Is SPEC 1 thread-safe enough for py313+nogil?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: What to do with np.matrix

2024-10-14 Thread Nathan via NumPy-Discussion
Here's a github code search for the string "np.matrix":

https://github.com/search?q=%22np.matrix%22&type=code

First, if you narrow down to just Python code, there are almost 60 thousand
results, which is quite high, much higher than we we're comfortable with
for outright removals for NumPy 2.0.

Compared with code searches I did in service of the NumPy 2.0 API changes,
this returns a lot of repositories in the flavor of "someone's homework
assignments" rather than "core scientific python package" or "package owned
by a billion dollar corporation".

So, it's good that "important" packages don't seem to use np.matrix much,
but also it's bad given that the code that *does* seem to use it is
probably infrequently or poorly tested, and will require a lengthy
deprecation period to catch, if the authors are inclined to do anything
about it at all.

In that case, I think moving things to an external pypi package along with
a long-lived shim in NumPy that points people to the pypi package is
probably the least disruptive thing to do, if we're going to do anything.

-Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NumPy 2.2.0 Released

2024-12-08 Thread Nathan via NumPy-Discussion
Improvements to the promoters for some of the string ufuncs:
https://github.com/numpy/numpy/pull/27636

Support for stringdtype arrays in the type hints and typing support for the
string ufuncs:
https://github.com/numpy/numpy/pull/27470

If you have a particular improvement you’re looking for I’d love to hear
more.

On Sun, Dec 8, 2024 at 3:01 PM Neal Becker via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Where can I find more information on improvements to stringdtype?
>
> On Sun, Dec 8, 2024, 11:25 AM Charles R Harris via NumPy-Discussion <
> numpy-discussion@python.org> wrote:
>
>> Hi All,
>>
>> On behalf of the NumPy team, I'm pleased to announce the release of NumPy
>> 2.2.0. The NumPy 2.2.0 release is a short release that brings us back
>> into sync with the usual twice yearly release cycle. There have been a
>> number of small cleanups, as well as work bringing the new StringDType to
>> completion and improving support for free threaded Python. Highlights are:
>>
>>- New functions `matvec` and `vecmat`, see below.
>>- Many improved annotations.
>>- Improved support for the new StringDType.
>>- Improved support for free threaded Python
>>- Fixes for f2py
>>
>> This release supports Python 3.10-3.13. Wheels can be downloaded from
>> PyPI ; source archives, release
>> notes, and wheel hashes are available on Github
>> .
>>
>> Cheers,
>>
>> Charles Harris
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: ndbeck...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] New GitHub issue UI

2025-01-14 Thread Nathan via NumPy-Discussion
Hi all,

GitHub is rolling out the new UI for issues, which includes a lot of new
opportunities to reorganize our backlog. More detail on the changelog blog:
https://github.blog/changelog/2025-01-13-evolving-github-issues-public-preview/

In particular, there is now much richer support for tracking issues by
marking issues as "sub-issues". We can also (finally) get rid of the issue
category labels - GitHub now has support for "issue types".

If someone with triage rights would like to take this on, it would be a
nice project to try to go through the backlog and update things to use the
new system, as well as the bot that auto-applies labels. You could probably
use a script rather than doing it manually.

-Nathan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Wondering if there is interest in a "variable convolution" feature in numpy?

2025-06-04 Thread Nathan via NumPy-Discussion
NumPy probably isn’t the right spot for this - we’re very conservative
about adding new functionality to NumPy that might also live in SciPy.
SciPy has convolution functionality but I’m not sure if they would want
greenfield code for this. Definitely worth asking the SciPy developers.

That said, have you considered publishing and promoting your own package on
PyPI and conda-forge? It’s a bit of work to get everything set up, but at
least these days you can “publish” the work (in an academic sense)
relatively straightforwardly with a Journal of Open Source Software
submission.

See also the PyOpenSci guide, which has extensive guidance for writing and
publishing packages for general consumption:

https://www.pyopensci.org/python-package-guide/index.html

On Wed, Jun 4, 2025 at 6:32 AM cantor.duster--- via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Hello,
>
> My team and I (especially @Arqu1100) have been working on energy-dependent
> convolutions for a nuclear physics application:
> https://github.com/det-lab/energyDependentColvolve.
>
> We're looking to release this code either as a standalone library or as
> part of a library because we ran into quite a few issues when writing the
> code and would like to help out other groups who need to do this "simple"
> calculation.
>
> This code is definitely not ready for a pull request, but if there's any
> interest in this feature we're happy to create one.  @Arqu1100 has worked
> particularly hard on creating test cases. The only existing library we
> found that does what we do here is varconvolve, which we haven't been able
> to verify against our test cases. Other examples of code that does a
> similar job are referenced on Stack Overflow as being embedded into ldscat (
> https://stackoverflow.com/questions/18624005/how-do-i-perform-a-convolution-in-python-with-a-variable-width-gaussian
> ).
>
> If there's a better place to ask the question, please let me know.  Thanks
> all!
>
> Amy Roberts
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Addition of eigenvalue functions

2025-06-12 Thread Nathan via NumPy-Discussion
If functionality is available in SciPy we usually don’t consider adding it
to NumPy. That rules out adding eig.

Is there any reason why polyeig doesn’t make sense to add to SciPy instead
of NumPy? Generally if functionality makes sense to add to SciPy that’s
where we point people to.

On Thu, Jun 12, 2025 at 6:39 AM waqar jamali via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> NumPy currently lacks a generalized eigenvalue function such as eig(A, B)
> or polyeig(A, B).
>
> These functions are essential for several algorithms, including the
> Criss-Cross algorithm and various eigenvalue problems. In particular,
> large-scale problems in control theory are often reduced to subspace
> problems where MATLAB routines like eig(A, B) and polyeig are widely used
> in research. Therefore, I believe that adding such functionality to NumPy
> would be highly beneficial.
>
> I have submitted a pull request here.
>
> https://github.com/numpy/numpy/pull/29163
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Tricky ufunc implementation question

2025-07-03 Thread Nathan via NumPy-Discussion
If a NumPy array is shared between two threads, NumPy doesn’t do anything
to synchronize array access. This is true in all Python versions and build
configurations - since NumPy releases the GIL during most array operations
whether or not you’re using free-threaded Python doesn’t change much except
for e.g. object arrays, which do hold the GIL.

See:
https://numpy.org/doc/stable/reference/thread_safety.html

IMO you probably shouldn’t try to enforce more strict thread safety than
NumPy itself does.

We didn’t add any locking to support free-threaded Python because it’s
always worked like this, and introducing locking might lead to performance
bottlenecks in read-only multithreaded applications and would substantially
increase NumPy’s internal complexity.

Long-term, I’d like to see more effort put towards adding stronger
guarantees around freezing arrays. I also want to look closer at adding
runtime checks to detect races and report them. One example: you could
imagine each array having an internal “version” counter that is incremented
every time the array is mutated. Doing an atomic read on the version before
and after a mutation should hopefully have a small overhead compared with
the rest of NumPy, and we could report runtime errors when arrays are
mutated “underneath” a thread doing an operation.

The devil is in the details though - there are *a lot* of ways to mutate
NumPy arrays. This also doesn’t consider the buffer protocol or accessing
arrays via third-party C extensions. See e.g. Alex Gaynor’s blog post on
this from the perspective of Rust and PyO3:

https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/

On Thu, Jul 3, 2025 at 5:50 AM Benjamin Root via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> On a related note, does numpy's gufunc mechanism provide any thread
> safety, or is the responsibility on the extension writer to do that? For
> simple numpy array inputs, I would think that I don't have to worry about
> free-threaded python messing things up (unless I have a global state), I'm
> wondering if something like dask array inputs could mess up calls to a
> thread-unsafe function.
>
> If it is on the extension writer, are there any examples on how to do
> that? Are there other guarantees (or lack thereof) that a gufunc writer
> should be aware of? How about reorderability? gufuncs operates on
> subarrays, so wouldn't dask inputs that are chunked potentially operate on
> the chunks in any order they like?
>
> Thanks,
> Ben Root
>
>
> On Tue, Jul 1, 2025 at 4:26 PM Benjamin Root  wrote:
>
>> Warren,
>>
>> The examples in ufunclab helped clear up a few things and I was able to
>> experiment and get a working gufunc! Thank you for your help!
>>
>> Ben Root
>>
>> On Fri, Jun 27, 2025 at 8:54 PM Benjamin Root 
>> wrote:
>>
>>> Warren,
>>>
>>> I'm fine with implementing it in C. I just didn't think gufuncs were for
>>> me. I couldn't tell from the description if it would be for my usecase
>>> since I wasn't looping over subarrays, and I didn't see any good examples.
>>> Maybe the documentation could be clearer. I'll have a look at your examples.
>>>
>>> I did try that signature with np.vectorize() with the signature keyword
>>> argument, but it didn't seem to work. Maybe it didn't work for the reasons
>>> in that open issue.
>>>
>>> Thank you,
>>> Ben Root
>>>
>>> On Fri, Jun 27, 2025 at 8:03 PM Warren Weckesser via NumPy-Discussion <
>>> numpy-discussion@python.org> wrote:
>>>
 On Fri, Jun 27, 2025 at 5:29 PM Benjamin Root via NumPy-Discussion
  wrote:
 >
 > I'm looking at a situation where I like to wrap a C++ function that
 takes two doubles as inputs, and returns an error code, a position vector,
 and a velocity vector so that I essentially would have a function signature
 of (N), (N) -> (N), (N, 3), (N, 3). When I try to use np.vectorize() or
 np.frompyfunc() on the python version of this function, I keep running into
 issues where it wants to make the outputs into object arrays of tuples. And
 looking at utilizing PyUFunc_FromFuncAndData, it isn't clear to me how I
 can tell it to expect those two output arrays to have a size 3 outer
 dimension.
 >
 > Are ufuncs the wrong thing here? How should I go about this? Is it
 even possible?

 Ben,

 It looks like the simplest signature for your core operation would be
 (),()->(),(3),(3), with broadcasting taking care of higher dimensional
 inputs.  Because not all the core shapes are scalars, that would
 require a *generalized* ufunc (gufunc).  There is an open issue
 (https://github.com/numpy/numpy/issues/14020) with a request for a
 function to generate a gufunc from a Python function.

 numba has the @guvectorize decorator, but I haven't use it much, and
 in my few quick attempts just now, it appeared to not accept fixed
 integer sizes in the output shape.  But wait to see if any numba gurus
 respond with a definitive a