[Numpy-discussion] Proposal to accept NEP 52 (Python API cleanup for 2.0)

2023-09-15 Thread Ralf Gommers
Hi all,

A lot of work has been happening to implement NEP 52 (
https://numpy.org/neps/nep-0052-python-api-cleanup.html) over the past 1.5
months - mostly work by Mateusz Sokol, and review effort of Sebastian,
Nathan and myself. The majority of API changes have been made. There's more
to do of course and there are pending PRs for a good fraction of that.
These two tracking issues cover a lot of ground and discussion around
decision on individual APIs:

- main namespace: https://github.com/numpy/numpy/issues/24306
- numpy.lib namespace: https://github.com/numpy/numpy/issues/24507

This PR with a migration guide will give a good sense of what has been
removed or changed so far: https://github.com/numpy/numpy/pull/24693.

In https://github.com/numpy/numpy/pull/24620 the NEP itself is being
updated for changes that have been made. And it will mark the NEP as
Accepted, which seems about time given that a lot of the work has already
been merged.

If there are no substantive objections within 7 days from this email, then
the NEP will be accepted; see NEP 0 for more details.

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal to accept NEP 52 (Python API cleanup for 2.0)

2023-09-15 Thread Dom Grigonis
Hello,

I have a couple of questions:
1. What is equivalent of np.byte_bounds? I have recently started using this.
2. Why are you removing business day functionality? Are there faster methods in 
python space for it? As far as I remember, for performance critical 
applications I have always resorted to numpy, including its business day 
functionality.

Regards,
DG

> On 15 Sep 2023, at 21:12, Ralf Gommers  wrote:
> 
> Hi all,
> 
> A lot of work has been happening to implement NEP 52 
> (https://numpy.org/neps/nep-0052-python-api-cleanup.html 
> ) over the past 1.5 
> months - mostly work by Mateusz Sokol, and review effort of Sebastian, Nathan 
> and myself. The majority of API changes have been made. There's more to do of 
> course and there are pending PRs for a good fraction of that. These two 
> tracking issues cover a lot of ground and discussion around decision on 
> individual APIs:
> 
> - main namespace: https://github.com/numpy/numpy/issues/24306 
> 
> - numpy.lib namespace: https://github.com/numpy/numpy/issues/24507 
> 
> 
> This PR with a migration guide will give a good sense of what has been 
> removed or changed so far: https://github.com/numpy/numpy/pull/24693 
> .
> 
> In https://github.com/numpy/numpy/pull/24620 
>  the NEP itself is being updated 
> for changes that have been made. And it will mark the NEP as Accepted, which 
> seems about time given that a lot of the work has already been merged. 
> 
> If there are no substantive objections within 7 days from this email, then 
> the NEP will be accepted; see NEP 0 for more details.
> 
> Cheers,
> Ralf
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-15 Thread Warren Weckesser
On Mon, Sep 11, 2023 at 12:25 PM Nathan  wrote:

>
>
> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>>
>>
>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
>> wrote:
>> >
>> > The NEP was merged in draft form, see below.
>> >
>> > https://numpy.org/neps/nep-0055-string_dtype.html
>> >
>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
>> wrote:
>> >>
>> >> Hello all,
>> >>
>> >> I just opened a pull request to add NEP 55, see
>> https://github.com/numpy/numpy/pull/24483.
>> >>
>> >> Per NEP 0, I've copied everything up to the "detailed description"
>> section below.
>> >>
>> >> I'm looking forward to your feedback on this.
>> >>
>> >> -Nathan Goldbaum
>> >>
>>
>> This will be a nice addition to NumPy, and matches a suggestion by
>> @rkern (and probably others) made in the 2017 mailing list thread;
>> see the last bullet of
>>
>>
>> https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
>>
>> So +1 for the enhancement!
>>
>> Now for some nitty-gritty review...
>>
>
> Thanks for the nitty-gritty review! I was on vacation last week and
> haven't had a chance to look over this in detail yet, but at first glance
> this seems like a really nice improvement.
>
> I'm going to try to integrate your proposed design into the dtype
> prototype this week. If that works, I'd like to include some of the text
> from the README in your repo in the NEP and add you as an author, would
> that be alright?
>


Sure, that would be fine.

I have a few more comments and questions about the NEP that I'll finish up
and send this weekend.

Warren


>
>
>>
>> There is a design change that I think should be made in the
>> implementation of missing values.
>>
>> In the current design described in the NEP, and expanded on in the
>> comment
>>
>> https://github.com/numpy/numpy/pull/24483#discussion_r1311815944,
>>
>> the meaning of the values `{len = 0, buf = NULL}` in an instance of
>> `npy_static_string` depends on whether or not the `na_object` has been
>> set in the dtype. If it has not been set, that data represents a string
>> of length 0. If `na_object` *has* been set, that data represents a
>> missing value. To get a string of length 0 in this case, some non-NULL
>> value must be assigned to the `buf` field. (In the comment linked
>> above, @ngoldbaum suggested `{0, "\0"}`, but strings are not
>> NUL-terminated, so there is no need for that `\0` in `buf`, and in fact,
>> with `len == 0`, it would be a bug for the pointer to be dereferenced,
>> so *any* non-NULL value--valid pointer or not--could be used for `buf`.)
>>
>> I think it would be better if `len == 0` *always* meant a string with
>> length 0, with no additional qualifications; it shouldn't be necessary
>> to put some non-NULL value in `buf` just to get an empty string. We
>> can achieve this if we use a bit in `len` as a flag for a missing value.
>> Reserving a bit from `len` as a flag reduces the maximum possible string
>> length, but as discussed in the NEP pull request, we're almost certainly
>> going to reserve at least the high bit of `len` when small string
>> optimization (SSO) is implemented. This will reduce the maximum string
>> length to `2**(N-1)-1`, where `N` is the bit width of `size_t`
>> (equivalent to using a signed type for `len`). Even if SSO isn't
>> implemented immediately, we can anticipate the need for flags stored
>> in `len`, and use them to implement missing values.
>>
>> The actual implementation of SSO will require some more design work,
>> because the offset of the most significant byte of `len` within the
>> `npy_static_string` struct depends on the platform endianess. For
>> little-endian, the most significant byte is not the first byte in the
>> struct, so the bytes available for SSO within the struct are not
>> contiguous when the fields have the order `{len, buf}`.
>>
>> I experimented with these ideas, and put the result at
>>
>> https://github.com/WarrenWeckesser/experiments/tree/master/c/numpy-vstring
>>
>> The idea that I propose there is to make the memory layout of the
>> struct depend on the endianess of the platform, so the most
>> significant byte of `len` (which I called `size`, to avoid any chance
>> of confusion with the actual length of the string [1]) is at the
>> beginning of the struct on big-endian platforms and at the end of the
>> struct for little-endian platforms. More details are included in the
>> file README.md. Note that I am not suggesting that all the SSO stuff
>> be included in the current NEP! This is just a proof-of-concept that
>> shows one possibility for SSO.
>>
>> In that design, the high bit of `size` (which is `len` here) being set
>> indicates that the `npy_static_string` struct should not be interpreted
>> as the standard `{len, buf}` representation of a string. When the
>> second highest bit is set, it means we have a missing value. If the
>> second highest bit is not set, SSO is active; see the link above for
>> more details.
>>
>> With this desig

[Numpy-discussion] Re: Proposal to accept NEP 52 (Python API cleanup for 2.0)

2023-09-15 Thread Ralf Gommers
On Fri, Sep 15, 2023 at 8:22 PM Dom Grigonis  wrote:

> Hello,
>
> I have a couple of questions:
> 1. What is equivalent of np.byte_bounds? I have recently started using
> this.
>

The migration guide says: Now it's available under ``np.lib.array_utils.
byte_bounds``

2. Why are you removing business day functionality? Are there faster
> methods in python space for it? As far as I remember, for performance
> critical applications I have always resorted to numpy, including its
> business day functionality.
>

This change was abandoned, because it was too much work. That is explained
in the PR that updates the NEP (https://github.com/numpy/numpy/pull/24620).

Cheers,
Ralf



> On 15 Sep 2023, at 21:12, Ralf Gommers  wrote:
>
> Hi all,
>
> A lot of work has been happening to implement NEP 52 (
> https://numpy.org/neps/nep-0052-python-api-cleanup.html) over the past
> 1.5 months - mostly work by Mateusz Sokol, and review effort of Sebastian,
> Nathan and myself. The majority of API changes have been made. There's more
> to do of course and there are pending PRs for a good fraction of that.
> These two tracking issues cover a lot of ground and discussion around
> decision on individual APIs:
>
> - main namespace: https://github.com/numpy/numpy/issues/24306
> - numpy.lib namespace: https://github.com/numpy/numpy/issues/24507
>
> This PR with a migration guide will give a good sense of what has been
> removed or changed so far: https://github.com/numpy/numpy/pull/24693.
>
> In https://github.com/numpy/numpy/pull/24620 the NEP itself is being
> updated for changes that have been made. And it will mark the NEP as
> Accepted, which seems about time given that a lot of the work has already
> been merged.
>
> If there are no substantive objections within 7 days from this email, then
> the NEP will be accepted; see NEP 0 for more details.
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ralf.gomm...@googlemail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal to accept NEP 52 (Python API cleanup for 2.0)

2023-09-15 Thread Aaron Meurer
The text of the NEP is a bit difficult to read in parts. Some of the
proposed changes are buried in comments in the code examples, and in
some sections it's not clear what is actually being proposed. Parts of
it read more like notes rather than a completed NEP. I would suggest
moving all English text outside of code blocks, and making sure that
complete English sentences are used everywhere.

Aaron Meurer

On Fri, Sep 15, 2023 at 12:14 PM Ralf Gommers  wrote:
>
> Hi all,
>
> A lot of work has been happening to implement NEP 52 
> (https://numpy.org/neps/nep-0052-python-api-cleanup.html) over the past 1.5 
> months - mostly work by Mateusz Sokol, and review effort of Sebastian, Nathan 
> and myself. The majority of API changes have been made. There's more to do of 
> course and there are pending PRs for a good fraction of that. These two 
> tracking issues cover a lot of ground and discussion around decision on 
> individual APIs:
>
> - main namespace: https://github.com/numpy/numpy/issues/24306
> - numpy.lib namespace: https://github.com/numpy/numpy/issues/24507
>
> This PR with a migration guide will give a good sense of what has been 
> removed or changed so far: https://github.com/numpy/numpy/pull/24693.
>
> In https://github.com/numpy/numpy/pull/24620 the NEP itself is being updated 
> for changes that have been made. And it will mark the NEP as Accepted, which 
> seems about time given that a lot of the work has already been merged.
>
> If there are no substantive objections within 7 days from this email, then 
> the NEP will be accepted; see NEP 0 for more details.
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: asmeu...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com