[Numpy-discussion] Re: Generalized UFunc without output dimension specified as argument

2023-08-20 Thread Warren Weckesser
On Sun, Aug 20, 2023 at 7:33 AM Doug Turnbull 
wrote:

> First of all, I really love the docs of the C API :) It's way above what I
> would expect!
>
> I was reviewing the signature possibilities for generalized UFuncs, and
> had a question
>
> https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html
>
> I am playing with a UFunc that scores and returns some top N, where N
> could be specified the user. IE the user might do
>
> get_most_similar(X, y, n=10)
>
> You can imagine situations where this could happen in similarity
> functions, where we want to get some Top N rows of X most similar to y. But
> sometimes users will want 10, or 100, or need to page through results etc.
> For performance reasons, I wouldn't want to maintain an index of every row
> of X, I'd prefer to only have to care about the top 10 or so.
>
> I wonder what the best way to do this?
>
> One thought I had was always set the output dimension to 10 for now, and
> handle paging on the python side by perhaps also having an offset parameter
> for my function, to window into the similar results.
>
> The second thought I had was to just get 100 instead of 10, as that
> probably is enough for most use cases. And users can slice out what they
> need. It's a little annoying in terms of perf cost, but probably not a big
> deal.
>
> But it would be convenient to just let the user specify the N they want.
>
>
Thanks for the suggestion, Doug.  This is something I've thought about
too.  In fact, I've drafted a proposal at
https://github.com/WarrenWeckesser/numpy-notes/blob/main/enhancements/gufunc-shape-only-params.md
for allowing "shape only" parameters of a gufunc.  This is the first time
that I've announced that proposal on the mailing list. Any comments from
NumPy devs would be appreciated.

Warren


Thanks for any insights!
> -Doug
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: welcome Andrew Nelson to the NumPy maintainers team

2023-08-22 Thread Warren Weckesser
On Mon, Aug 21, 2023 at 4:37 AM Ralf Gommers  wrote:

> Hi all,
>
> On behalf of the steering council, I am very happy to announce that Andrew
> is joining the Maintainers team. Andrew has been contributing to our CI
> setup in particular for the past year, and has contributed for example the
> Cirrus CI setup and the musllinux builds:
> https://github.com/numpy/numpy/pulls/andyfaff.
>
> Welcome Andrew, I'm looking forward to working with you more
>

> Cheers,
> Ralf
>


Welcome Andrew, and thanks for all the great work you've done so far.

Warren



> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-03 Thread Warren Weckesser
On Tue, Aug 29, 2023 at 10:09 AM Nathan  wrote:
>
> The NEP was merged in draft form, see below.
>
> https://numpy.org/neps/nep-0055-string_dtype.html
>
> On Mon, Aug 21, 2023 at 2:36 PM Nathan  wrote:
>>
>> Hello all,
>>
>> I just opened a pull request to add NEP 55, see
https://github.com/numpy/numpy/pull/24483.
>>
>> Per NEP 0, I've copied everything up to the "detailed description"
section below.
>>
>> I'm looking forward to your feedback on this.
>>
>> -Nathan Goldbaum
>>

This will be a nice addition to NumPy, and matches a suggestion by
@rkern (and probably others) made in the 2017 mailing list thread;
see the last bullet of

 https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html

So +1 for the enhancement!

Now for some nitty-gritty review...

There is a design change that I think should be made in the
implementation of missing values.

In the current design described in the NEP, and expanded on in the
comment

https://github.com/numpy/numpy/pull/24483#discussion_r1311815944,

the meaning of the values `{len = 0, buf = NULL}` in an instance of
`npy_static_string` depends on whether or not the `na_object` has been
set in the dtype. If it has not been set, that data represents a string
of length 0. If `na_object` *has* been set, that data represents a
missing value. To get a string of length 0 in this case, some non-NULL
value must be assigned to the `buf` field. (In the comment linked
above, @ngoldbaum suggested `{0, "\0"}`, but strings are not
NUL-terminated, so there is no need for that `\0` in `buf`, and in fact,
with `len == 0`, it would be a bug for the pointer to be dereferenced,
so *any* non-NULL value--valid pointer or not--could be used for `buf`.)

I think it would be better if `len == 0` *always* meant a string with
length 0, with no additional qualifications; it shouldn't be necessary
to put some non-NULL value in `buf` just to get an empty string. We
can achieve this if we use a bit in `len` as a flag for a missing value.
Reserving a bit from `len` as a flag reduces the maximum possible string
length, but as discussed in the NEP pull request, we're almost certainly
going to reserve at least the high bit of `len` when small string
optimization (SSO) is implemented. This will reduce the maximum string
length to `2**(N-1)-1`, where `N` is the bit width of `size_t`
(equivalent to using a signed type for `len`). Even if SSO isn't
implemented immediately, we can anticipate the need for flags stored
in `len`, and use them to implement missing values.

The actual implementation of SSO will require some more design work,
because the offset of the most significant byte of `len` within the
`npy_static_string` struct depends on the platform endianess. For
little-endian, the most significant byte is not the first byte in the
struct, so the bytes available for SSO within the struct are not
contiguous when the fields have the order `{len, buf}`.

I experimented with these ideas, and put the result at

https://github.com/WarrenWeckesser/experiments/tree/master/c/numpy-vstring

The idea that I propose there is to make the memory layout of the
struct depend on the endianess of the platform, so the most
significant byte of `len` (which I called `size`, to avoid any chance
of confusion with the actual length of the string [1]) is at the
beginning of the struct on big-endian platforms and at the end of the
struct for little-endian platforms. More details are included in the
file README.md. Note that I am not suggesting that all the SSO stuff
be included in the current NEP! This is just a proof-of-concept that
shows one possibility for SSO.

In that design, the high bit of `size` (which is `len` here) being set
indicates that the `npy_static_string` struct should not be interpreted
as the standard `{len, buf}` representation of a string. When the
second highest bit is set, it means we have a missing value. If the
second highest bit is not set, SSO is active; see the link above for
more details.

With this design, `len == 0` *always* means a string of length 0,
regardless of whether or not `na_object` is defined in the dtype.

Also with this design, an array created with `calloc()` will
automatically be an array of empty strings. With current design in
the NEP, an array created with `calloc()` will be either an array of
empty strings, or an array of missing values, depending on whether or
not the dtype has `na_object` defined. That conditional behavior
seems less than desirable.

What do you think?

--Warren

[1] I would like to see `len` renamed to `size` in the
`npy_static_string` struct, but that's bikeshed stuff, and not
a blocker.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-15 Thread Warren Weckesser
On Mon, Sep 11, 2023 at 12:25 PM Nathan  wrote:

>
>
> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>>
>>
>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
>> wrote:
>> >
>> > The NEP was merged in draft form, see below.
>> >
>> > https://numpy.org/neps/nep-0055-string_dtype.html
>> >
>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
>> wrote:
>> >>
>> >> Hello all,
>> >>
>> >> I just opened a pull request to add NEP 55, see
>> https://github.com/numpy/numpy/pull/24483.
>> >>
>> >> Per NEP 0, I've copied everything up to the "detailed description"
>> section below.
>> >>
>> >> I'm looking forward to your feedback on this.
>> >>
>> >> -Nathan Goldbaum
>> >>
>>
>> This will be a nice addition to NumPy, and matches a suggestion by
>> @rkern (and probably others) made in the 2017 mailing list thread;
>> see the last bullet of
>>
>>
>> https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
>>
>> So +1 for the enhancement!
>>
>> Now for some nitty-gritty review...
>>
>
> Thanks for the nitty-gritty review! I was on vacation last week and
> haven't had a chance to look over this in detail yet, but at first glance
> this seems like a really nice improvement.
>
> I'm going to try to integrate your proposed design into the dtype
> prototype this week. If that works, I'd like to include some of the text
> from the README in your repo in the NEP and add you as an author, would
> that be alright?
>


Sure, that would be fine.

I have a few more comments and questions about the NEP that I'll finish up
and send this weekend.

Warren


>
>
>>
>> There is a design change that I think should be made in the
>> implementation of missing values.
>>
>> In the current design described in the NEP, and expanded on in the
>> comment
>>
>> https://github.com/numpy/numpy/pull/24483#discussion_r1311815944,
>>
>> the meaning of the values `{len = 0, buf = NULL}` in an instance of
>> `npy_static_string` depends on whether or not the `na_object` has been
>> set in the dtype. If it has not been set, that data represents a string
>> of length 0. If `na_object` *has* been set, that data represents a
>> missing value. To get a string of length 0 in this case, some non-NULL
>> value must be assigned to the `buf` field. (In the comment linked
>> above, @ngoldbaum suggested `{0, "\0"}`, but strings are not
>> NUL-terminated, so there is no need for that `\0` in `buf`, and in fact,
>> with `len == 0`, it would be a bug for the pointer to be dereferenced,
>> so *any* non-NULL value--valid pointer or not--could be used for `buf`.)
>>
>> I think it would be better if `len == 0` *always* meant a string with
>> length 0, with no additional qualifications; it shouldn't be necessary
>> to put some non-NULL value in `buf` just to get an empty string. We
>> can achieve this if we use a bit in `len` as a flag for a missing value.
>> Reserving a bit from `len` as a flag reduces the maximum possible string
>> length, but as discussed in the NEP pull request, we're almost certainly
>> going to reserve at least the high bit of `len` when small string
>> optimization (SSO) is implemented. This will reduce the maximum string
>> length to `2**(N-1)-1`, where `N` is the bit width of `size_t`
>> (equivalent to using a signed type for `len`). Even if SSO isn't
>> implemented immediately, we can anticipate the need for flags stored
>> in `len`, and use them to implement missing values.
>>
>> The actual implementation of SSO will require some more design work,
>> because the offset of the most significant byte of `len` within the
>> `npy_static_string` struct depends on the platform endianess. For
>> little-endian, the most significant byte is not the first byte in the
>> struct, so the bytes available for SSO within the struct are not
>> contiguous when the fields have the order `{len, buf}`.
>>
>> I experimented with these ideas, and put the result at
>>
>> https://github.com/WarrenWeckesser/experiments/tree/master/c/numpy-vstring
>>
>> The idea that I propose there is to make the memory layout of the
>> struct depend on the endianess of the platform, so the most
>> significant byte of `len` (which I called `size`, to avoid any chance
>> of confusion with the actual length of the string [1]) is at the
>> 

[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-19 Thread Warren Weckesser
On Fri, Sep 15, 2023 at 3:18 PM Warren Weckesser 
wrote:
>
>
>
> On Mon, Sep 11, 2023 at 12:25 PM Nathan  wrote:
>>
>>
>>
>> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
warren.weckes...@gmail.com> wrote:
>>>
>>>
>>>
>>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
wrote:
>>> >
>>> > The NEP was merged in draft form, see below.
>>> >
>>> > https://numpy.org/neps/nep-0055-string_dtype.html
>>> >
>>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
wrote:
>>> >>
>>> >> Hello all,
>>> >>
>>> >> I just opened a pull request to add NEP 55, see
https://github.com/numpy/numpy/pull/24483.
>>> >>
>>> >> Per NEP 0, I've copied everything up to the "detailed description"
section below.
>>> >>
>>> >> I'm looking forward to your feedback on this.
>>> >>
>>> >> -Nathan Goldbaum
>>> >>
>>>
>>> This will be a nice addition to NumPy, and matches a suggestion by
>>> @rkern (and probably others) made in the 2017 mailing list thread;
>>> see the last bullet of
>>>
>>>
https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
>>>
>>> So +1 for the enhancement!
>>>
>>> Now for some nitty-gritty review...
>>
>>
>> Thanks for the nitty-gritty review! I was on vacation last week and
haven't had a chance to look over this in detail yet, but at first glance
this seems like a really nice improvement.
>>
>> I'm going to try to integrate your proposed design into the dtype
prototype this week. If that works, I'd like to include some of the text
from the README in your repo in the NEP and add you as an author, would
that be alright?
>
>
>
> Sure, that would be fine.
>
> I have a few more comments and questions about the NEP that I'll finish
up and send this weekend.
>

One more comment on the NEP...

My first impression of the missing data API design is that
it is more complicated than necessary. An alternative that
is simpler--and is consistent with the pattern established for
floats and datetimes--is to define a "not a string" value, say
`np.nastring` or something similar, just like we have `nan` for
floats and `nat` for datetimes. Its behavior could be what
you called "nan-like".

The handling of `np.nastring` would be an intrinsic part of the
dtype, so there would be no need for the `na_object` parameter
of `StringDType`. All `StringDType`s would handle `np.nastring`
in the same consistent manner.

The use-case for the string sentinel does not seem very
compelling (but maybe I just don't understand the use-cases).
If there is a real need here that is not covered by
`np.nastring`, perhaps just a flag to control the repr of
`np.nastring` for each StringDType instance would be enough?

If there is an objection to a potential proliferation of
"not a thing" special values, one for each type that can
handle them, then perhaps a generic "not a value" (say
`np.navalue`) could be created that, when assigned to an
element of an array, results in the appropriate "not a thing"
value actually being assigned. In a sense, I guess this NEP is
proposing that, but it is reusing the floating point object
`np.nan` as the generic "not a thing" value, and my preference
is that, *if* we go with such a generic object, it is not
the floating point value `nan` but a new thing with a name
that reflects its purpose. (I guess Pandas users might be
accustomed to `nan` being a generic sentinel for missing data,
so its use doesn't feel as incohesive as it might to others.
Passing a string array to `np.isnan()` just feels *wrong* to
me.)

Any, that's my 2¢.

Warren


>
> Warren
>
>>
>>
>>>
>>>
>>> There is a design change that I think should be made in the
>>> implementation of missing values.
>>>
>>> In the current design described in the NEP, and expanded on in the
>>> comment
>>>
>>> https://github.com/numpy/numpy/pull/24483#discussion_r1311815944,
>>>
>>> the meaning of the values `{len = 0, buf = NULL}` in an instance of
>>> `npy_static_string` depends on whether or not the `na_object` has been
>>> set in the dtype. If it has not been set, that data represents a string
>>> of length 0. If `na_object` *has* been set, that data represents a
>>> missing value. To get a string of length 0 in this case, some non-NULL
>>> value must be assigned to the `buf` field. (In the comment linked
>>> above, @ngoldbaum suggested `{

[Numpy-discussion] Re: Adding bfill() to numpy.

2024-05-20 Thread Warren Weckesser
On Mon, May 20, 2024 at 9:06 AM Raquel Braunschweig via
NumPy-Discussion  wrote:
>
> Hello everyone,
>
> My colleague and I will be opening a Pull Request (PR) about adding bfill() 
> (backward fill) function to NumPy. This function is designed to fill NaN 
> values in an array by propagating the next valid observation backward along a 
> specified axis. We believe this addition will be highly useful for data 
> preprocessing and manipulation tasks.


This was proposed earlier this year in this issue:
https://github.com/numpy/numpy/issues/25823

Take note of rkern's comment in that issue; I think most NumPy
developers will agree with what he says there.

Warren


>
> Here are some key points regarding our proposed implementation:
>
> Function Explanation: The bfill() function identifies NaN values in an array 
> and replaces them by copying the next valid value in the array backwards. 
> Users can specify the axis along which the filling should be performed, 
> providing flexibility for different data structures.
> Use Cases: This function is particularly beneficial in time series analysis, 
> data cleaning, and preparing datasets for machine learning models.
>
> We are looking forward to your feedback and suggestions.
>
> Thank you for your attention and we appreciate your support.
>
> Best regards,
> Raquel and Gonçalo
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Enhancement for generalized ufuncs

2024-07-11 Thread Warren Weckesser
I have implemented quite a few generalized ufuncs over in ufunclab
(https://github.com/WarrenWeckesser/ufunclab), and in the process I
have accumulated a gufunc "wish list". Two items on that list are:

(1) the ability to impose constraints on the core dimensions that are
checked when the gufunc is called. By far the most common use-case I
have is requiring that a dimension have length at least 1. To do this
currently, I check the shapes in the ufunc loop function, and if they
are not valid, raise an exception and hope that the gufunc machinery
processes it as expected when the loop function returns. (Sorry, I'm
using lingo--"loop function", "core dimension", etc--that will be
familiar to those who already know the ufunc C API, but not so
familiar to general users of NumPy.)

(2) the ability to have the output dimension be a function of the
input dimensions, instead of being limited to one of the input
dimensions or an independent dimension. Then one could create, for
example, a 1-d convolution gufunc with shape signature that is
effectively `(m),(n)->(m + n - 1)` (corresponding to `mode='full'` in
`np.convolve`) and the gufunc code would automatically allocate the
output with the correct shape and dtype.

I have proposed a change in https://github.com/numpy/numpy/pull/26908
that makes both these features possible. A field is added to the
PyUFuncObject that is an optional user-defined C function that the
gufunc author implements. When a gufunc is called, this function is
called with an array of the values of the core dimensions of the input
and output arrays. Some or all of the output core dimensions might be
-1, meaning the arrays are to be allocated by the gufunc/iterator
machinery.  The new "hook" allows the user to check the given core
dimensions and raise an exception if some constraint is not satisfied.
The user-defined function can also replace those -1 values with sizes
that it computes based on the given input core dimensions.

To define the 1-d convolution gufunc, the actual shape signature that
is passed to `PyUFunc_FromFuncAndDataAndSignature` is `(m),(n)->(p)`.
When a user passes arrays with shapes, say, (20,) and (30,) as the
input and with no output array specified, the user-defined function
will get the array [20, 30, -1]. It would replace -1 with m + n - 1 =
49 and return. If the caller *does* include an output array in the
call, the core dimension of that array will be the third element of
the array passed to the user-defined function. In that case, the
function verifies that the value equals m + n - 1, and raises an
exception if it doesn't.

Here's that 1-d convolution, called `conv1d_full` here, in action:

```
In [14]: import numpy as np

In [15]: from experiment import conv1d_full

In [16]: type(conv1d_full)
Out[16]: numpy.ufunc
```

`m = 4`, `n = 6`, so the output shape is `p = m + n - 1 = 9`:

```
In [17]: conv1d_full([1, 2, 3, 4], [-1, 1, 2, 1.5, -2, 1])
Out[17]: array([-1. , -1. , 1. , 4.5, 11. , 9.5, 2. , -5. , 4. ])
```

Standard broadcasting:

```
In [18]: conv1d_full([[1, 2, 3, 4], [0.5, 0, -1, 1]], [-1, 1, 2, 1.5, -2, 1])
Out[18]:
array([[-1. , -1. , 1. , 4.5 , 11. , 9.5 , 2. , -5. , 4. ],
[-0.5 , 0.5 , 2. , -1.25, -2. , 1. , 3.5 , -3. , 1. ]])
```

Comments here or over in the pull request are welcome. The essential
changes to the source code are just 7 lines in `ufunc_object.c` and 7
lines in `ufuncobject.h`. The rest of the changes in the PR create a
couple gufuncs that use the new feature, with corresponding unit
tests.

Warren
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Enhancement for generalized ufuncs

2024-07-12 Thread Warren Weckesser
On Fri, Jul 12, 2024 at 7:47 AM Sebastian Berg
 wrote:
>

> (You won't be able to know these relations from reading the signature,
> but I doubt it's worth worrying about that.)

After creating the gufunc with `PyUFunc_FromFuncAndDataAndSignature`,
the gufunc author could set the `core_signature` field at the same
time that `process_core_dims_func` is set.  That requires freeing the
old signature and allocating memory for the new one.  For the 1-d
convolution example, the signature would be set to `"(m),(n)->(m + n -
1)"`:

```
In [1]: from experiment import conv1d_full

In [2]: conv1d_full.signature
Out[2]: '(m),(n)->(m + n - 1)'
```

Warren
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Enhancement for generalized ufuncs

2024-07-12 Thread Warren Weckesser
On Fri, Jul 12, 2024 at 2:35 PM Sebastian Berg
 wrote:
>
> On Fri, 2024-07-12 at 09:56 -0400, Warren Weckesser wrote:
> > On Fri, Jul 12, 2024 at 7:47 AM Sebastian Berg
> >  wrote:
> > >
> >
> > > (You won't be able to know these relations from reading the
> > > signature,
> > > but I doubt it's worth worrying about that.)
> >
> > After creating the gufunc with `PyUFunc_FromFuncAndDataAndSignature`,
> > the gufunc author could set the `core_signature` field at the same
> > time that `process_core_dims_func` is set.  That requires freeing the
> > old signature and allocating memory for the new one.  For the 1-d
> > convolution example, the signature would be set to `"(m),(n)->(m + n
> > -
> > 1)"`:
> >
> > ```
> > In [1]: from experiment import conv1d_full
> >
> > In [2]: conv1d_full.signature
> > Out[2]: '(m),(n)->(m + n - 1)'
> > ```
>
>
> I have to look at the PR, but the ufunc parses the signature only once?

Yes.  The ufunc doesn't need to save the signature string once it has
parsed it.  As the comment about it  in `ufuncobject.h` says, the
`core_signature` field (type `char *`) is "for printing purposes".

Updating the `core_signature` is optional.  I hadn't thought about it
until you mentioned it above, but it doesn't seem like a bad idea.  I
added this to the `experiment` module at
https://github.com/WarrenWeckesser/experiments/blob/main/python/numpy/gufunc-process_core_dims_func/experiment.c,
where I now have a helper function `reset_core_signature(PyUFuncObject
*gufunc, char *new_signature)` to update the field.  This is not part
of the pull request.  I'm not sure if that is something we would
suggest as the recommended practice for this feature.

> That solution seems very hacky, but allowing to just replace the
> signature may make sense.
> (Downside is, if someone else wants to parse the original signature,
> but I guess it is unlikely.)
>
> In either case, the only other thing to hook into would be the
> signature parsing itself with the full shapes available.  But then you
> may need to deal with `axes=`, etc. as well, so I think your solution
> that only adjusts shapes seems better.
> It's much simpler and should cover most or even all relevant things.

Yes, for the use-cases that I can think of, all that matters are the
values of the core dimensions of the input and output arrays when the
gufunc is called.

Warren

>
> - Sebastian
>
>
>
> >
> > Warren
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Welcome Joren Hammudoglu to the NumPy Maintainers Team

2024-08-19 Thread Warren Weckesser
On Mon, Aug 19, 2024 at 6:45 AM Sebastian Berg
 wrote:
>
> Hi all,
>
> please join me in welcoming Joren (https://github.com/jorenham) to the
> NumPy maintainers team.
>

Welcome Joren!

Warren

> Joren has done a lot of work recently contributing, reviewing, and
> maintaining typing related improvements to NumPy.
> We are looking forward to see new momentum to improve NumPy typing.
>
> Thanks for all the contributions!
>
> Cheers,
>
> Sebastian
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Uniform interface for accessing minimum or maximum value of a dtype

2024-08-26 Thread Warren Weckesser
On Mon, Aug 26, 2024 at 5:42 PM Sebastian Berg
 wrote:
>
> On Mon, 2024-08-26 at 11:26 -0400, Marten van Kerkwijk wrote:
> > I think a NEP is a good idea.  It would also seem to make sense to
> > consider how the dtype itself can hold/calculate this type of
> > information, since that will be the only way a generic ``info()``
> > function can get information for a user-defined dtype.  Indeed,
> > taking
> > that further, might a method or property on the dtype itself be
> > the cleaner interface?  I.e., one would do ``dtype.info().min`` or
> > ``dtype.info.min``.
> >
>
> I agree, I think it should be properties/attributes (I don't think it
> needs to be a function, it should be cheap?)
> Now it might also be that `np.finfo()` could keep working via
> `dtype.finfo` or a dunder if we want to hide it.
>
> In general, I would lean towards some form of attributes, even if I am
> not sure if they should be `.info`, `.finfo`, or even directly on the
> dtype.
> (`.info.min` seems tricky, because I am not sure it is clear whether
> inf or the minimum finite value is "min".)
>
> A (potentially very short) NEP would probably help to get momentum on
> making a decision.  I certainly would like to see this being worked on!
>
> - Sebastian
>

A namespace attached to the dtype to hold useful constants seems like
a good approach.

This could also be used to hold type-dependent constants such as `pi`,
`e`, etc. for the real floating point types.  Over in
https://github.com/numpy/numpy/issues/9698, I suggested the name
`constants` 
(https://github.com/numpy/numpy/issues/9698#issuecomment-2186653455).
This would also be available for user-defined dtypes, where types such
as  `quaddtype`
(https://github.com/numpy/numpy-user-dtypes/tree/main/quaddtype) ,
`mpfdtype` (https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype),
and `logfloat32` (https://github.com/WarrenWeckesser/numtypes) would
make available their own representations of `pi`, `e`, etc.

There will be some work required to define the semantics of the
existing attributes.  Not all attributes can be required for all data
types.  For example, a few considerations off the top of my head:

* The `min` and `max` values for `datetime64` andf `timedelta64` would
have values that depend on the time unit.
* Floating point types that are not IEEE 754 such as IBM double-double
wouldn't necessarily have all the attributes IEEE 754 float types
have.
* The StringDType has a well-defined `min` (the empty string), but not a `max`.

Warren


> > -- Marten
> >
> > Nathan  writes:
> >
> > > That seems reasonable to me on its face. There are some corner
> > > cases to work out though.
> > >
> > > Swayam is tinkering with a quad precision dtype written using rhe
> > > new DType API and just ran into the
> > > fact that finfo doesn’t support user dtypes:
> > >
> > > https://github.com/numpy/numpy/issues/27231
> > >
> > > IMO any new feature along these lines should have some thought in
> > > the design about how to handle
> > > user-defined data types.
> > >
> > > Another thing to consider is that data types can be non-numeric
> > > (things like categories) or number-like
> > > but not really just a number like a quantity with a physical unit.
> > > That means you should also think
> > > about what to do where fields like min and max don’t make any sense
> > > or need to be a generic python
> > > object rather than a numeric type.
> > >
> > > I think if someone proposed a NEP that fully worked this out it
> > > would be welcome. My understanding
> > > is that the array API consortium prefers to standardize on APIs
> > > that gain tractions in libraries rather
> > > than inventing APIs and telling libraries to adopt them, so I think
> > > a NEP is the right first step, rather
> > > than trying to standardize something in the array API.
> > >
> > > On Mon, Aug 26, 2024 at 8:06 AM Lucas Colley <
> > > lucas.coll...@gmail.com> wrote:
> > >
> > >  Or how about `np.dtype_info(dt)`, which could return an object
> > > with attributes like `min` and `max
> > >  `. Would that be possible?
> > >  ___
> > >  NumPy-Discussion mailing list -- numpy-discussion@python.org
> > >  To unsubscribe send an email to numpy-discussion-le...@python.org
> > >
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > >  Member address: nathan12...@gmail.com
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gm

Re: [Numpy-discussion] Deprecate zipf distribution?

2017-10-07 Thread Warren Weckesser
On Sat, Oct 7, 2017 at 11:29 AM, Charles R Harris  wrote:

> Hi All,
>
> The current NumPy implementation of the truncated zipf distribution has
> several drawbacks.
>
>
>- Extremely poor performance when the parameter `a` is near 1. For
>instance, when `a = 1.01` a simple change in the implementation speeds
>things up by a factor of 1,657. When the parameter is closer to 1, the
>algorithm effectively hangs.
>- Because the distribution is truncated, say to integers in the range
>of int64, the parameter could be allowed to take all values > 0, even
>though the untruncated series diverges. There is some indication that such
>values of `a` can be useful in modeling  because of the heavy distribution
>in the tail.
>
> Because fixing these problems will change the output stream, I suggest
> implementing a truncated zeta distribution, which is an alternative name
> for the same distribution, and deprecating the the zipf distribution.
> Furthermore, rather than truncate at the value of C long, which varies,
> truncate at max(int64), or some possibly smaller value, say 2**44, which
> allows all integers up to that value to be realized with approximately
> correct probabilities when using double precision for the intermediate
> computations.
>
> Thoughts?
>
>
It is time that the 'random' API is extended to include some means of
selecting a version of the random number generation algorithm.  This has
come up in discussions on github (e.g.
https://github.com/numpy/numpy/pull/5158#issuecomment-58185802).  Then
instead of deprecating the existing 'zipf`' function, the user has the
option of selecting which version of the code to use.  Current users that
are satisfied with the existing 'zipf' implementation are not affected.
But I'm not against deprecating 'zipf' if the code is bad enough that the
best long-term option is removing it.

Something like this will be needed if there is interest in merging a pull
request that I just submitted (https://github.com/numpy/numpy/pull/9834)
that fixes (and improves the performance of) the generation of
hypergeometric variates when the number of samples drawn is small.

Warren



> Chubk
>

I think Chuck just got a new hip-hop name. :)



>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: softmax

2018-03-14 Thread Warren Weckesser
On Wed, Mar 14, 2018 at 4:05 AM, Kulick, Johannes 
wrote:

> Hi,
>
>
>
> I regularly need the softmax function (https://en.wikipedia.org/
> wiki/Softmax_function) for my code. I have a quite efficient pure python
> implementation (credits to Nolan Conaway). I think it would be a valuable
> enhancement of the ndarray class. But since it is kind of a specialty
> function I wanted to ask you if you would consider it to be part of the
> numpy core (alongside ndarray.max and ndarray.argmax) or rather in scipy
> (e.g. scipy.stats seems also an appropriate place).
>
>

Johannes,

If the numpy devs aren't interested in adding it to numpy, I'm pretty sure
we can get it in scipy.  I've had adding it (or at least proposing that it
be added) to scipy on my to-do list for quite a while now.

Warren



>
>
> Best
>
> Johannes
>
>
>
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Casting scalars

2018-05-10 Thread Warren Weckesser
On Thu, May 10, 2018 at 10:53 PM, Hameer Abbasi 
wrote:

> Yes, that I know. I meant given a dtype string such as 'uint8' or a
> dtype object. I know I can possibly do np.array(scalar,
> dtype=dtype)[()] but I was looking for a less hacky method.



Apparently the `dtype` object has the attribute `type` that creates objects
of that dtype.

For example,

In [30]: a
Out[30]: array([ 1.,  2.,  3.])

In [31]: dt = a.dtype

In [32]: dt
Out[32]: dtype('float64')

In [33]: x = dt.type(8675309)  # Convert the scalar to a's dtype.

In [34]: x
Out[34]: 8675309.0

In [35]: type(x)
Out[35]: numpy.float64


Warren




> On
> 11/05/2018 at 07:50, Stuart wrote: np.float(scalar) On Thu, May 10,
> 2018 at 7:49 PM Hameer Abbasi  wrote:
> Hello, everyone! I might be missing something and this might be a very
> stupid and redundant question, but is there a way to cast a scalar to
> a given dtype? Hameer ___
> NumPy-Discussion mailing list NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-03 Thread Warren Weckesser
On Sat, Jun 2, 2018 at 3:04 PM, Robert Kern  wrote:

> As promised distressingly many months ago, I have written up a NEP about
> relaxing the stream-compatibility policy that we currently have.
>
> https://github.com/numpy/numpy/pull/11229
> https://github.com/rkern/numpy/blob/nep/rng/doc/neps/
> nep-0019-rng-policy.rst
>
> I particularly invite comment on the two lists of methods that we still
> would make strict compatibility guarantees for.
>
> ---
>


Thanks, Robert.   It looks like you are neatly cutting the Gordian Knot of
API versioning in numpy.random!  I don't have any specific comments, except
that it will be great to have *something* other than the status quo, so we
can starting improving the existing numpy.random functions.

Warren



> ==
> Random Number Generator Policy
> ==
>
> :Author: Robert Kern 
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>
>
> Abstract
> 
>
> For the past decade, NumPy has had a strict backwards compatibility policy
> for
> the number stream of all of its random number distributions.  Unlike other
> numerical components in ``numpy``, which are usually allowed to return
> different when results when they are modified if they remain correct, we
> have
> obligated the random number distributions to always produce the exact same
> numbers in every version.  The objective of our stream-compatibility
> guarantee
> was to provide exact reproducibility for simulations across numpy versions
> in
> order to promote reproducible research.  However, this policy has made it
> very
> difficult to enhance any of the distributions with faster or more accurate
> algorithms.  After a decade of experience and improvements in the
> surrounding
> ecosystem of scientific software, we believe that there are now better
> ways to
> achieve these objectives.  We propose relaxing our strict
> stream-compatibility
> policy to remove the obstacles that are in the way of accepting
> contributions
> to our random number generation capabilities.
>
>
> The Status Quo
> --
>
> Our current policy, in full:
>
> A fixed seed and a fixed series of calls to ``RandomState`` methods
> using the
> same parameters will always produce the same results up to roundoff
> error
> except when the values were incorrect.  Incorrect values will be fixed
> and
> the NumPy version in which the fix was made will be noted in the
> relevant
> docstring.  Extension of existing parameter ranges and the addition of
> new
> parameters is allowed as long the previous behavior remains unchanged.
>
> This policy was first instated in Nov 2008 (in essence; the full set of
> weasel
> words grew over time) in response to a user wanting to be sure that the
> simulations that formed the basis of their scientific publication could be
> reproduced years later, exactly, with whatever version of ``numpy`` that
> was
> current at the time.  We were keen to support reproducible research, and
> it was
> still early in the life of ``numpy.random``.  We had not seen much cause to
> change the distribution methods all that much.
>
> We also had not thought very thoroughly about the limits of what we really
> could promise (and by “we” in this section, we really mean Robert Kern,
> let’s
> be honest).  Despite all of the weasel words, our policy overpromises
> compatibility.  The same version of ``numpy`` built on different
> platforms, or
> just in a different way could cause changes in the stream, with varying
> degrees
> of rarity.  The biggest is that the ``.multivariate_normal()`` method
> relies on
> ``numpy.linalg`` functions.  Even on the same platform, if one links
> ``numpy``
> with a different LAPACK, ``.multivariate_normal()`` may well return
> completely
> different results.  More rarely, building on a different OS or CPU can
> cause
> differences in the stream.  We use C ``long`` integers internally for
> integer
> distribution (it seemed like a good idea at the time), and those can vary
> in
> size depending on the platform.  Distribution methods can overflow their
> internal C ``longs`` at different breakpoints depending on the platform and
> cause all of the random variate draws that follow to be different.
>
> And even if all of that is controlled, our policy still does not provide
> exact
> guarantees across versions.  We still do apply bug fixes when correctness
> is at
> stake.  And even if we didn’t do that, any nontrivial program does more
> than
> just draw random numbers.  They do computations on those numbers, transform
> those with numerical algorithms from the rest of ``numpy``, which is not
> subject to so strict a policy.  Trying to maintain stream-compatibility
> for our
> random number distributions does not help reproducible research for these
> reasons.
>
> The standard practice now for bit-for-bit reproducible research is to pin
> all
> of the versions of code of your software stack, possibly d

Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-03 Thread Warren Weckesser
On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers 
wrote:

>
>
> On Sun, Jun 3, 2018 at 6:54 PM,  wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern 
>> wrote:
>>
>>> On Sun, Jun 3, 2018 at 5:46 PM  wrote:
>>>


 On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern 
 wrote:

>
> The list of ``StableRandom`` methods should be chosen to support unit
>> tests:
>>
>> * ``.randint()``
>> * ``.uniform()``
>> * ``.normal()``
>> * ``.standard_normal()``
>> * ``.choice()``
>> * ``.shuffle()``
>> * ``.permutation()``
>>
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
> @bashtage writes:
> > standard_gamma and standard_exponential are important enough to be
> included here IMO.
>
> "Importance" was not my criterion, only whether they are used in unit
> test suites. This list was just off the top of my head for methods that I
> think were actually used in test suites, so I'd be happy to be shown live
> tests that use other methods. I'd like to be a *little* conservative about
> what methods we stick in here, but we don't have to be *too* conservative,
> since we are explicitly never going to be modifying these.
>

 That's one area where I thought the selection is too narrow.
 We should be able to get a stable stream from the uniform for some
 distributions.

 However, according to the Wikipedia description Poisson doesn't look
 easy. I just wrote a unit test for statsmodels using Poisson random numbers
 with hard coded numbers for the regression tests.

>>>
>>> I'd really rather people do this than use StableRandom; this is best
>>> practice, as I see it, if your tests involve making precise comparisons to
>>> expected results.
>>>
>>
>> I hardcoded the results not the random data. So the unit tests rely on a
>> reproducible stream of Poisson random numbers.
>> I don't want to save 500 (100 or 1000) observations in a csv file for
>> every variation of the unit test that I run.
>>
>
> I agree, hardcoding numbers in every place where seeded random numbers are
> now used is quite unrealistic.
>
> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
>
>

As a quick and imperfect test, I monkey-patched numpy so that a call to
numpy.random.seed(m) actually uses m+1000 as the seed.  I ran the tests
using the `runtests.py` script:

*seed+1000, using 'python runtests.py -n' in the source directory:*

  236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
xpassed


Most of the failures are in scipy.stats:

*seed+1000, using 'python runtests.py -n -s stats' in the source directory:*

  203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed


Changing the amount added to the seed or running the tests using the
function `scipy.test("full")` gives different (but similar magnitude)
results:

*seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:*

  269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed

*seed+1, using 'python runtests.py -n' in the source directory:*

  305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
xpassed


I suspect many of the tests will be easy to update, so fixing 300 or so
tests does not seem like a monumental task.  I haven't looked into why
there are 585 deselected tests; maybe there are many more tests lurking
there that will have to be updated.

Warren



Ralf
>
>
>
>>
>>
>>>
>>> StableRandom is intended as a crutch so that the pain of moving existing
>>> unit tests away from the deprecated RandomState is less onerous. I'd really
>>> rather people write better unit tests!
>>>
>>> In particular, I do not want to add any of the integer-domain
>>> distributions (aside from shuffle/permutation/choice) as these are the ones
>>> that have the platform-dependency issues with respect to 32/64-bit `long`
>>> integers. They'd be unreliable for unit tests even if we kept them stable
>>> over time.
>>>
>>>
 I'm not sure which other distributions are common enough and not easily
 reproducible by transformation. E.g. negative binomial can be reproduces by
 a gamma-poisson mixture.

 On the other hand normal can be easily recreated from standard_normal.

>>>
>>> I was mostly motivated by making it a bit easier to mechanically replace
>>> uses of randn(), which is probably even more common than normal() and
>>> standard_normal() in unit tests.
>>>
>>>
 Would it be difficult to keep this list large, given that it should be
 frozen, low maintenance code ?

>>>
>>> I admit that I had in mind non-statistical unit tests. That is, tests

Re: [Numpy-discussion] count_nonzero axis argument?

2018-09-17 Thread Warren Weckesser
On Mon, Sep 17, 2018 at 7:38 AM Matthew Brett 
wrote:

> Hi,
>
> Is there any reason that np.count_nonzero should not take an axis
> argument?  As in:
>
> >>> np.better_count_nonzero([[10, 11], [0, 3]], axis=1)
> array([2, 1])
>
>

It already does (since version 1.12.0):
https://docs.scipy.org/doc/numpy/reference/generated/numpy.count_nonzero.html

Warren


It would be much more useful if it did...
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] align `choices` and `sample` with Python `random` module

2018-12-10 Thread Warren Weckesser
On 12/10/18, Ralf Gommers  wrote:
> On Sun, Dec 9, 2018 at 2:00 PM Alan Isaac  wrote:
>
>> I believe this was proposed in the past to little enthusiasm,
>> with the response, "you're using a library; learn its functions".
>>
>
> Not only that, NumPy and the core libraries around it are the standard for
> numerical/statistical computing. If core Python devs want to replicate a
> small subset of that functionality in a new Python version like 3.6, it
> would be sensible for them to choose compatible names. I don't think
> there's any justification for us to bother our users based on new things
> that get added to the stdlib.
>
>
>> Nevertheless, given the addition of `choices` to the Python
>> random module in 3.6, it would be nice to have the *same name*
>> for parallel functionality in numpy.random.
>>
>> And given the redundancy of numpy.random.sample, it would be
>> nice to deprecate it with the intent to reintroduce
>> the name later, better aligned with Python's usage.
>>
>
> No, there is nothing wrong with the current API, so I'm -10 on deprecating
> it.

Actually, the `numpy.random.choice` API has one major weakness.  When
`replace` is False and `size` is greater than 1, the function is actually
drawing a *one* sample from a multivariate distribution.  For the other
multivariate distributions (multinomial, multivariate_normal and
dirichlet), `size` sets the number of samples to draw from the
distribution.  With `replace=False` in `choice`, size becomes a *parameter*
of the distribution, and it is only possible to draw one (multivariate)
sample.

I thought about this some time ago, and came up with an API that eliminates
the boolean flag, and separates the `size` argument from the number of
items drawn in one sample, which I'll call `nsample`. To avoid creating a
"false friend" with the standard library and with numpy's `choice`, I'll
call this function `select`.

Here's the proposed signature and docstring.  (A prototype implementation
is in a gist at
https://gist.github.com/WarrenWeckesser/2e5905d116e710914af383ee47adc2bf.)
The key feature is the `nsample` argument, which sets how many items to
select from the given collection.  Also note that this function is *always*
drawing *without replacement*.  It covers the `replace=True` case because
drawing one item without replacement is the same as drawing one item with
replacement.

Whether or not an API like the following is used, it would be nice if there
was some way to get multiple samples in the `replace=False` case in one
function call.

def select(items, nsample=None, p=None, size=None):
"""
Select random samples from `items`.

The function randomly selects `nsample` items from `items` without
replacement.

Parameters
--
items : sequence
The collection of items from which the selection is made.
nsample : int, optional
Number of items to select without replacement in each draw.
It must be between 0 and len(items), inclusize.
p : array-like of floats, same length as `items, optional
Probabilities of the items.  If this argument is not given,
the elements in `items` are assumed to have equal probability.
size : int, optional
Number of variates to draw.

Notes
-
`size=None` means "generate a single selection".

If `size` is None, the result is equivalent to
numpy.random.choice(items, size=nsample, replace=False)

`nsample=None` means draw one (scalar) sample.
If `nsample` is None, the functon acts (almost) like nsample=1 (see
below for more information), and the result is equivalent to
numpy.random.choice(items, size=size)
In effect, it does choice with replacement.  The case `nsample=None`
can be interpreted as each sample is a scalar, and `nsample=k`
means each sample is a sequence with length k.

If `nsample` is not None, it must be a nonnegative integer with
0 <= nsample <= len(items).

If `size` is not None, it must be an integer or a tuple of integers.
When `size` is an integer, it is treated as the tuple ``(size,)``.

When both `nsample` and `size` are not None, the result
has shape ``size + (nsample,)``.

Examples

Make 6 choices with replacement from [10, 20, 30, 40].  (This is
equivalent to "Make 1 choice without replacement from [10, 20, 30, 40];
do it six times.")

>>> select([10, 20, 30, 40], size=6)
array([20, 20, 40, 10, 40, 30])

Choose two items from [10, 20, 30, 40] without replacement.  Do it six
times.

>>> select([10, 20, 30, 40], nsample=2, size=6)
array([[40, 10],
   [20, 30],
   [10, 40],
   [30, 10],
   [10, 30],
   [10, 20]])

When `nsample` is an integer, there is always an axis at the end of the
result with length `nsample`, even when `nsample=1`.  For example, the
shape of the array returned in the following call is (2, 3, 1)

>>> select([1

Re: [Numpy-discussion] align `choices` and `sample` with Python `random` module

2018-12-11 Thread Warren Weckesser
On Tue, Dec 11, 2018 at 10:32 AM Ralf Gommers 
wrote:

>
>
> On Mon, Dec 10, 2018 at 10:27 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>>
>>
>> On 12/10/18, Ralf Gommers  wrote:
>> > On Sun, Dec 9, 2018 at 2:00 PM Alan Isaac  wrote:
>> >
>> >> I believe this was proposed in the past to little enthusiasm,
>> >> with the response, "you're using a library; learn its functions".
>> >>
>> >
>> > Not only that, NumPy and the core libraries around it are the standard
>> for
>> > numerical/statistical computing. If core Python devs want to replicate a
>> > small subset of that functionality in a new Python version like 3.6, it
>> > would be sensible for them to choose compatible names. I don't think
>> > there's any justification for us to bother our users based on new things
>> > that get added to the stdlib.
>> >
>> >
>> >> Nevertheless, given the addition of `choices` to the Python
>> >> random module in 3.6, it would be nice to have the *same name*
>> >> for parallel functionality in numpy.random.
>> >>
>> >> And given the redundancy of numpy.random.sample, it would be
>> >> nice to deprecate it with the intent to reintroduce
>> >> the name later, better aligned with Python's usage.
>> >>
>> >
>> > No, there is nothing wrong with the current API, so I'm -10 on
>> deprecating
>> > it.
>>
>> Actually, the `numpy.random.choice` API has one major weakness.  When
>> `replace` is False and `size` is greater than 1, the function is actually
>> drawing a *one* sample from a multivariate distribution.  For the other
>> multivariate distributions (multinomial, multivariate_normal and
>> dirichlet), `size` sets the number of samples to draw from the
>> distribution.  With `replace=False` in `choice`, size becomes a *parameter*
>> of the distribution, and it is only possible to draw one (multivariate)
>> sample.
>>
>
> I'm not sure I follow. `choice` draws samples from a given 1-D array, more
> than 1:
>
> In [12]: np.random.choice(np.arange(5), size=2, replace=True)
> Out[12]: array([2, 2])
>
> In [13]: np.random.choice(np.arange(5), size=2, replace=False)
> Out[13]: array([3, 0])
>
> The multivariate distribution you're talking about is for generating the
> indices I assume. Does the current implementation actually give a result
> for size>1 that has different statistic properties from calling the
> function N times with size=1? If so, that's definitely worth a bug report
> at least (I don't think there is one for this).
>
>
There is no bug, just a limitation in the API.

When I draw without replacement, say, three values from a collection of
length five, the three values that I get are not independent.  So really,
this is *one* sample from a three-dimensional (discrete-valued)
distribution.  The problem with the current API is that I can't get
multiple samples from this three-dimensional distribution in one call.  If
I need to repeat the process six times, I have to use a loop, e.g.:

>>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False,
size=3) for _ in range(6)]

With the `select` function I described in my previous email, which I'll
call `random_select` here, the parameter that determines the number of
items per sample, `nsample`, is separate from the parameter that determines
the number of samples, `size`:

>>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6)
>>> samples
array([[30, 40, 50],
   [40, 50, 30],
   [10, 20, 40],
   [20, 30, 50],
   [40, 20, 50],
   [20, 10, 30]])

(`select` is a really bad name, since `numpy.select` already exists and is
something completely different.  I had the longer name `random.select` in
mind when I started using it. "There are only two hard problems..." etc.)

Warren



> Cheers,
> Ralf
>
>
>
>> I thought about this some time ago, and came up with an API that
>> eliminates the boolean flag, and separates the `size` argument from the
>> number of items drawn in one sample, which I'll call `nsample`. To avoid
>> creating a "false friend" with the standard library and with numpy's
>> `choice`, I'll call this function `select`.
>>
>> Here's the proposed signature and docstring.  (A prototype implementation
>> is in a gist at
>> https://gist.github.com/WarrenWeckesser/2e5905d116e710914af383ee47adc2bf.)
>> The key feature is the `nsample` argument, which sets 

Re: [Numpy-discussion] align `choices` and `sample` with Python `random` module

2018-12-11 Thread Warren Weckesser
On Tue, Dec 11, 2018 at 1:37 PM Warren Weckesser 
wrote:

>
>
> On Tue, Dec 11, 2018 at 10:32 AM Ralf Gommers 
> wrote:
>
>>
>>
>> On Mon, Dec 10, 2018 at 10:27 AM Warren Weckesser <
>> warren.weckes...@gmail.com> wrote:
>>
>>>
>>>
>>> On 12/10/18, Ralf Gommers  wrote:
>>> > On Sun, Dec 9, 2018 at 2:00 PM Alan Isaac 
>>> wrote:
>>> >
>>> >> I believe this was proposed in the past to little enthusiasm,
>>> >> with the response, "you're using a library; learn its functions".
>>> >>
>>> >
>>> > Not only that, NumPy and the core libraries around it are the standard
>>> for
>>> > numerical/statistical computing. If core Python devs want to replicate
>>> a
>>> > small subset of that functionality in a new Python version like 3.6, it
>>> > would be sensible for them to choose compatible names. I don't think
>>> > there's any justification for us to bother our users based on new
>>> things
>>> > that get added to the stdlib.
>>> >
>>> >
>>> >> Nevertheless, given the addition of `choices` to the Python
>>> >> random module in 3.6, it would be nice to have the *same name*
>>> >> for parallel functionality in numpy.random.
>>> >>
>>> >> And given the redundancy of numpy.random.sample, it would be
>>> >> nice to deprecate it with the intent to reintroduce
>>> >> the name later, better aligned with Python's usage.
>>> >>
>>> >
>>> > No, there is nothing wrong with the current API, so I'm -10 on
>>> deprecating
>>> > it.
>>>
>>> Actually, the `numpy.random.choice` API has one major weakness.  When
>>> `replace` is False and `size` is greater than 1, the function is actually
>>> drawing a *one* sample from a multivariate distribution.  For the other
>>> multivariate distributions (multinomial, multivariate_normal and
>>> dirichlet), `size` sets the number of samples to draw from the
>>> distribution.  With `replace=False` in `choice`, size becomes a *parameter*
>>> of the distribution, and it is only possible to draw one (multivariate)
>>> sample.
>>>
>>
>> I'm not sure I follow. `choice` draws samples from a given 1-D array,
>> more than 1:
>>
>> In [12]: np.random.choice(np.arange(5), size=2, replace=True)
>> Out[12]: array([2, 2])
>>
>> In [13]: np.random.choice(np.arange(5), size=2, replace=False)
>> Out[13]: array([3, 0])
>>
>> The multivariate distribution you're talking about is for generating the
>> indices I assume. Does the current implementation actually give a result
>> for size>1 that has different statistic properties from calling the
>> function N times with size=1? If so, that's definitely worth a bug report
>> at least (I don't think there is one for this).
>>
>>
> There is no bug, just a limitation in the API.
>
> When I draw without replacement, say, three values from a collection of
> length five, the three values that I get are not independent.  So really,
> this is *one* sample from a three-dimensional (discrete-valued)
> distribution.  The problem with the current API is that I can't get
> multiple samples from this three-dimensional distribution in one call.  If
> I need to repeat the process six times, I have to use a loop, e.g.:
>
> >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False,
> size=3) for _ in range(6)]
>
> With the `select` function I described in my previous email, which I'll
> call `random_select` here, the parameter that determines the number of
> items per sample, `nsample`, is separate from the parameter that determines
> the number of samples, `size`:
>
> >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6)
> >>> samples
> array([[30, 40, 50],
>[40, 50, 30],
>[10, 20, 40],
>[20, 30, 50],
>[40, 20, 50],
>[20, 10, 30]])
>
> (`select` is a really bad name, since `numpy.select` already exists and is
> something completely different.  I had the longer name `random.select` in
> mind when I started using it. "There are only two hard problems..." etc.)
>
>

As I reread this, I see another naming problem:  "sample" is used to mean
different things.  In my description above,  I referred to one "sample" as
the length

Re: [Numpy-discussion] align `choices` and `sample` with Python `random` module

2018-12-11 Thread Warren Weckesser
On Tue, Dec 11, 2018 at 2:27 PM Stephan Hoyer  wrote:

> On Tue, Dec 11, 2018 at 10:39 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>> There is no bug, just a limitation in the API.
>>
>> When I draw without replacement, say, three values from a collection of
>> length five, the three values that I get are not independent.  So really,
>> this is *one* sample from a three-dimensional (discrete-valued)
>> distribution.  The problem with the current API is that I can't get
>> multiple samples from this three-dimensional distribution in one call.  If
>> I need to repeat the process six times, I have to use a loop, e.g.:
>>
>> >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False,
>> size=3) for _ in range(6)]
>>
>> With the `select` function I described in my previous email, which I'll
>> call `random_select` here, the parameter that determines the number of
>> items per sample, `nsample`, is separate from the parameter that determines
>> the number of samples, `size`:
>>
>> >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6)
>> >>> samples
>> array([[30, 40, 50],
>>[40, 50, 30],
>>[10, 20, 40],
>>[20, 30, 50],
>>[40, 20, 50],
>>[20, 10, 30]])
>>
>> (`select` is a really bad name, since `numpy.select` already exists and
>> is something completely different.  I had the longer name `random.select`
>> in mind when I started using it. "There are only two hard problems..." etc.)
>>
>> Warren
>>
>
> This is an issue for the probability distributions from scipy.stats, too.
>
> The only library that I know handles this well is TensorFlow Probability,
> which has a notion of "batch" vs "events" dimensions in distributions. It's
> actually pretty comprehensive, and makes it easy to express these sorts of
> operations:
>
> >>> import tensorflow_probability as tfp
> >>> import tensorflow as tf
> >>> tf.enable_eager_execution()
> >>> dist = tfp.distributions.Categorical(tf.zeros((3, 5)))
> >>> dist
>  event_shape=() dtype=int32>
> >>> dist.sample(6)
>  [2, 1, 3], [4, 4, 2], [0, 1, 1], [0, 2, 2], [2, 0, 4]], dtype=int32)>
>


Yes, tensorflow-probability includes broadcasting of the parameters and
generating multiple variates in one call, but note that your example is not
sampling without replacement.  For sampling 3 items without replacement
from a population, the *event_shape* (to use tensorflow-probability
terminology) would have to be (3,).

Warren


___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] #4808 -- let np.pad default to constant

2019-03-13 Thread Warren Weckesser
On 3/13/19, Stefan van der Walt  wrote:
> In PR 4808, I propose changing the default padding mode (for `np.pad`)
> to constant (0).

+1

Warren

>
> It was suggested that I mention the change here, in case someone has a
> use case or argument for not making it.
>
> https://github.com/numpy/numpy/pull/4808
>
> Thanks!
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: SciPy 1.3.0

2019-05-17 Thread Warren Weckesser
ave been deprecated since v1.0.0 are removed.
> `SciPy documentation for
> v1.1.0 <https://docs.scipy.org/doc/scipy-1.1.0/reference/misc.html>`__
> can be used to track the new import locations for the relocated functions.
>
> `scipy.linalg` changes
> -
>
> For ``pinv``, ``pinv2``, and ``pinvh``, the default cutoff values are
> changed
> for consistency (see the docs for the actual values).
>
> `scipy.optimize` changes
> -
>
> The default method for ``linprog`` is now ``'interior-point'``. The
> method's
> robustness and speed come at a cost: solutions may not be accurate to
> machine precision or correspond with a vertex of the polytope defined
> by the constraints. To revert to the original simplex method,
> include the argument ``method='simplex'``.
>
> `scipy.stats` changes
> 
>
> Previously, ``ks_2samp(data1, data2)`` would run a two-sided test and
> return
> the approximated p-value. The new signature, ``ks_2samp(data1, data2,
> alternative="two-sided", method="auto")``, still runs the two-sided test by
> default but returns the exact p-value for small samples and the
> approximated
> value for large samples. ``method="asymp"`` would be equivalent to the
> old version but ``auto`` is the better choice.
>
> Other changes
> =
>
> Our tutorial has been expanded with a new section on global optimizers
>
> There has been a rework of the ``stats.distributions`` tutorials.
>
> `scipy.optimize` now correctly sets the convergence flag of the result to
> ``CONVERR``, a convergence error, for bounded scalar-function root-finders
> if the maximum iterations has been exceeded, ``disp`` is false, and
> ``full_output`` is true.
>
> `scipy.optimize.curve_fit` no longer fails if ``xdata`` and ``ydata``
> dtypes
> differ; they are both now automatically cast to ``float64``.
>
> `scipy.ndimage` functions including ``binary_erosion``, ``binary_closing``,
> and
> ``binary_dilation`` now require an integer value for the number of
> iterations,
> which alleviates a number of reported issues.
>
> Fixed normal approximation in case ``zero_method == "pratt"`` in
> `scipy.stats.wilcoxon`.
>
> Fixes for incorrect probabilities, broadcasting issues and thread-safety
> related to stats distributions setting member variables inside
> ``_argcheck()``.
>
> `scipy.optimize.newton` now correctly raises a ``RuntimeError``, when
> default
> arguments are used, in the case that a derivative of value zero is
> obtained,
> which is a special case of failing to converge.
>
> A draft toolchain roadmap is now available, laying out a compatibility plan
> including Python versions, C standards, and NumPy versions.
>
>
> Authors
> ===
>
> * ananyashreyjain +
> * ApamNapat +
> * Scott Calabrese Barton +
> * Christoph Baumgarten
> * Peter Bell +
> * Jacob Blomgren +
> * Doctor Bob +
> * Mana Borwornpadungkitti +
> * Matthew Brett
> * Evgeni Burovski
> * CJ Carey
> * Vega Theil Carstensen +
> * Robert Cimrman
> * Forrest Collman +
> * Pietro Cottone +
> * David +
> * Idan David +
> * Christoph Deil
> * Dieter Werthmüller
> * Conner DiPaolo +
> * Dowon
> * Michael Dunphy +
> * Peter Andreas Entschev +
> * Gökçen Eraslan +
> * Johann Faouzi +
> * Yu Feng
> * Piotr Figiel +
> * Matthew H Flamm
> * Franz Forstmayr +
> * Christoph Gohlke
> * Richard Janis Goldschmidt +
> * Ralf Gommers
> * Lars Grueter
> * Sylvain Gubian
> * Matt Haberland
> * Yaroslav Halchenko
> * Charles Harris
> * Lindsey Hiltner
> * JakobStruye +
> * He Jia +
> * Jwink3101 +
> * Greg Kiar +
> * Julius Bier Kirkegaard
> * John Kirkham +
> * Thomas Kluyver
> * Vladimir Korolev +
> * Joseph Kuo +
> * Michael Lamparski +
> * Eric Larson
> * Denis Laxalde
> * Katrin Leinweber
> * Jesse Livezey
> * ludcila +
> * Dhruv Madeka +
> * Magnus +
> * Nikolay Mayorov
> * Mark Mikofski
> * Jarrod Millman
> * Markus Mohrhard +
> * Eric Moore
> * Andrew Nelson
> * Aki Nishimura +
> * OGordon100 +
> * Petar Mlinarić +
> * Stefan Peterson
> * Matti Picus +
> * Ilhan Polat
> * Aaron Pries +
> * Matteo Ravasi +
> * Tyler Reddy
> * Ashton Reimer +
> * Joscha Reimer
> * rfezzani +
> * Riadh +
> * Lucas Roberts
> * Heshy Roskes +
> * Mirko Scholz +
> * Taylor D. Scott +
> * Srikrishna Sekhar +
> * Kevin Sheppard +
> * Sourav Singh
> * skjerns +
> * Kai Striega
> * SyedSaifAliAlvi +
> * Gopi Manohar T +
> * Albert Thomas +
> * Timon +
> * Paul van Mulbregt
&

Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Warren Weckesser
On 6/24/19, Marten van Kerkwijk  wrote:
> Hi Eric,
>
> The easiest definitely is for the mask to just propagate, which that even
> if just one point is masked, all points in the fft will be masked.
>
> On the direct point I made, I think it is correct that since one can think
> of the Fourier transform of a sine/cosine fit, then there is a solution
> even in the presence of some masked data, and this solution is distinct
> from that for a specific choice of fill value. But of course it is also
> true that the solution will be at least partially degenerate in its result
> and possibly indeterminate (e.g., for the extreme example of a real
> transform for which all but the first point are masked, all cosine term
> amplitudes are equal to the value of the first term, and are completely
> degenerate with each other, and all sine term amplitudes are indeterminate;
> one has only one piece of information, after all). Yet the inverse of any
> of those choices reproduces the input. That said, clearly there is a choice
> to be made whether this solution is at all interesting, which means that
> you are right that it needs an explicit user decision.
>

FWIW: The discrete Fourier transform is equivalent to a matrix
multiplication (https://en.wikipedia.org/wiki/DFT_matrix,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html),
so whatever behavior you define for a nonmasked array times a masked
vector also applies to the FFT of a masked vector.

Warren


> All the best,
>
> Marten
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NEP 32: Remove the financial functions from NumPy

2019-09-03 Thread Warren Weckesser
Github issue 2880 ("Get financial functions out of main namespace",
https://github.com/numpy/numpy/issues/2880) has been open since 2013. In a
recent community meeting, it was suggested that we create a NEP to propose
the removal of the financial functions from NumPy.  I have submitted "NEP
32:  Remove the financial functions from NumPy" in a pull request at
https://github.com/numpy/numpy/pull/14399.  A copy of the latest version of
the NEP is below.

According to the NEP process document, "Once the PR is in place, the NEP
should be announced on the mailing list for discussion (comments on the PR
itself should be restricted to minor editorial and technical fixes)."  This
email is the announcement for NEP 32.

The NEP includes a brief summary of the history of the financial functions,
and has links to several relevant mailing list threads, dating back to when
the functions were added to NumPy in 2008.  I recommend reviewing those
threads before commenting here.

Warren

-

==
NEP 32 — Remove the financial functions from NumPy
==========

:Author: Warren Weckesser 
:Status: Draft
:Type: Standards Track
:Created: 2019-08-30


Abstract


We propose deprecating and ultimately removing the financial functions [1]_
from NumPy.  The functions will be moved to an independent repository,
and provided to the community as a separate package with the name
``numpy_financial``.


Motivation and scope


The NumPy financial functions [1]_ are the 10 functions ``fv``, ``ipmt``,
``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and
``rate``.
The functions provide elementary financial calculations such as future
value,
net present value, etc. These functions were added to NumPy in 2008 [2]_.

In May, 2009, a request by Joe Harrington to add a function called ``xirr``
to
the financial functions triggered a long thread about these functions [3]_.
One important point that came up in that thread is that a "real" financial
library must be able to handle real dates.  The NumPy financial functions do
not work with actual dates or calendars.  The preference for a more capable
library independent of NumPy was expressed several times in that thread.

In June, 2009, D. L. Goldsmith expressed concerns about the correctness of
the
implementations of some of the financial functions [4]_.  It was suggested
then
to move the financial functions out of NumPy to an independent package.

In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the
financial
functions from the top-level namespace to ``numpy.financial``.  He also
suggested giving the functions better names.  Responses at that time
included
the suggestion to deprecate them and move them from NumPy to a separate
package.  This issue is still open.

Later in 2013 [6]_, it was suggested on the mailing list that these
functions
be removed from NumPy.

The arguments for the removal of these functions from NumPy:

* They are too specialized for NumPy.
* They are not actually useful for "real world" financial calculations,
because
  they do not handle real dates and calendars.
* The definition of "correctness" for some of these functions seems to be a
  matter of convention, and the current NumPy developers do not have the
  background to judge their correctness.
* There has been little interest among past and present NumPy developers
  in maintaining these functions.

The main arguments for keeping the functions in NumPy are:

* Removing these functions will be disruptive for some users.  Current users
  will have to add the new ``numpy_financial`` package to their
dependencies,
  and then modify their code to use the new package.
* The functions provided, while not "industrial strength", are apparently
  similar to functions provided by spreadsheets and some calculators.
Having
  them available in NumPy makes it easier for some developers to migrate
their
  software to Python and NumPy.

It is clear from comments in the mailing list discussions and in the GitHub
issues that many current NumPy developers believe the benefits of removing
the functions outweigh the costs.  For example, from [5]_::

The financial functions should probably be part of a separate package
-- Charles Harris

If there's a better package we can point people to we could just
deprecate
them and then remove them entirely... I'd be fine with that too...
-- Nathaniel Smith

+1 to deprecate them. If no other package exists, it can be created if
someone feels the need for that.
-- Ralf Gommers

I feel pretty strongly that we should deprecate these. If nobody on
numpy’s
core team is interested in maintaining them, then it is purely a drag on
development for NumPy.
-- Stephan Hoyer

And from the 2013 mailing list discussion, about removing the functions fr

Re: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy

2019-09-08 Thread Warren Weckesser
On 9/4/19, Matthew Brett  wrote:
> Hi,
>
> Maybe worth asking over at the Pandas list?  I bet there are more
> Python / finance people over there.


OK, I sent a message to the PyData mailing list.

Warren


>
> Cheers,
>
> Matthew
>
> On Wed, Sep 4, 2019 at 7:11 PM Ilhan Polat  wrote:
>>
>> +1 on removing them from NumPy. I think there are plenty of alternatives
>> already so many that we might even consider deprecating them just like
>> SciPy misc module by pointing to alternatives.
>>
>> On Tue, Sep 3, 2019 at 6:38 PM Sebastian Berg 
>> wrote:
>>>
>>> On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote:
>>> > Github issue 2880 ("Get financial functions out of main namespace",
>>>
>>> Very briefly, I am absolutely in favor of this.
>>>
>>> Keeping the functions in numpy seems more of a liability than help
>>> anyone. And this push is more likely to help users by spurring
>>> development on a good replacement, than a practically unmaintained
>>> corner of NumPy that may seem like it solves a problem, but probably
>>> does so very poorly.
>>>
>>> Moving them into a separate pip installable package seems like the best
>>> way forward until a better replacement, to which we can point users,
>>> comes up.
>>>
>>> - Sebastian
>>>
>>>
>>> > https://github.com/numpy/numpy/issues/2880) has been open since 2013.
>>> > In a recent community meeting, it was suggested that we create a NEP
>>> > to propose the removal of the financial functions from NumPy.  I have
>>> > submitted "NEP 32:  Remove the financial functions from NumPy" in a
>>> > pull request at https://github.com/numpy/numpy/pull/14399.  A copy of
>>> > the latest version of the NEP is below.
>>> >
>>> > According to the NEP process document, "Once the PR is in place, the
>>> > NEP should be announced on the mailing list for discussion (comments
>>> > on the PR itself should be restricted to minor editorial and
>>> > technical fixes)."  This email is the announcement for NEP 32.
>>> >
>>> > The NEP includes a brief summary of the history of the financial
>>> > functions, and has links to several relevant mailing list threads,
>>> > dating back to when the functions were added to NumPy in 2008.  I
>>> > recommend reviewing those threads before commenting here.
>>> >
>>> > Warren
>>> >
>>> > -
>>> >
>>> > ==
>>> > NEP 32 — Remove the financial functions from NumPy
>>> > ==
>>> >
>>> > :Author: Warren Weckesser 
>>> > :Status: Draft
>>> > :Type: Standards Track
>>> > :Created: 2019-08-30
>>> >
>>> >
>>> > Abstract
>>> > 
>>> >
>>> > We propose deprecating and ultimately removing the financial
>>> > functions [1]_
>>> > from NumPy.  The functions will be moved to an independent
>>> > repository,
>>> > and provided to the community as a separate package with the name
>>> > ``numpy_financial``.
>>> >
>>> >
>>> > Motivation and scope
>>> > 
>>> >
>>> > The NumPy financial functions [1]_ are the 10 functions ``fv``,
>>> > ``ipmt``,
>>> > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and
>>> > ``rate``.
>>> > The functions provide elementary financial calculations such as
>>> > future value,
>>> > net present value, etc. These functions were added to NumPy in 2008
>>> > [2]_.
>>> >
>>> > In May, 2009, a request by Joe Harrington to add a function called
>>> > ``xirr`` to
>>> > the financial functions triggered a long thread about these functions
>>> > [3]_.
>>> > One important point that came up in that thread is that a "real"
>>> > financial
>>> > library must be able to handle real dates.  The NumPy financial
>>> > functions do
>>> > not work with actual dates or calendars.  The preference for a more
>>> > capable
>>> > library independent of NumPy was expressed several times in that
>>> > thread.
>>> >
>>> > In June, 2009, D

Re: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy

2019-09-11 Thread Warren Weckesser
On 9/9/19, D.S. McNeil  wrote:
> [coming over from the pydata post]
>
> I just checked about ~150KLOC of our Python code in a financial context,
> written by about twenty developers over about four years.  Almost every
> function uses numpy, sometimes directly and sometimes via pandas.
>
> It seems like these functions were never used anywhere, and the lead dev on
> one of the projects responded "never used them; didn't even know they
> exist".  I knew they existed, but even on the rare occasion I need the
> functionality I need better control over the dates, which means for
> practical purposes I need something which supports Series natively anyhow.
>
> As it is, they also clutter up the namespace in unfriendly ways: if there's
> going to be a top-level function called np.rate I don't think this is the
> one it should be.  Admittedly that's more an argument against their current
> location.
>
> Although it wouldn't be useful for us, I could imagine someone finding a
> package which provides numpy-compatible versions of the many OpenFormula
> (or
> whatever the spec is called) functions helpful.  Having numpy carry a tiny
> subset of them doesn't feel productive.
>
> +1 for removing them.
>
>
> Doug


Thanks Doug, that's useful feedback.

Warren


>
>
>
> --
> Sent from: http://numpy-discussion.10968.n7.nabble.com/
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy

2019-09-11 Thread Warren Weckesser
On 9/3/19, Warren Weckesser  wrote:
> Github issue 2880 ("Get financial functions out of main namespace",
> https://github.com/numpy/numpy/issues/2880) has been open since 2013. In a
> recent community meeting, it was suggested that we create a NEP to propose
> the removal of the financial functions from NumPy.  I have submitted "NEP
> 32:  Remove the financial functions from NumPy" in a pull request at
> https://github.com/numpy/numpy/pull/14399.  A copy of the latest version of
> the NEP is below.


FYI, the NEP is now also available at
https://numpy.org/neps/nep-0032-remove-financial-functions.html.

Warren


>
> According to the NEP process document, "Once the PR is in place, the NEP
> should be announced on the mailing list for discussion (comments on the PR
> itself should be restricted to minor editorial and technical fixes)."  This
> email is the announcement for NEP 32.
>
> The NEP includes a brief summary of the history of the financial functions,
> and has links to several relevant mailing list threads, dating back to when
> the functions were added to NumPy in 2008.  I recommend reviewing those
> threads before commenting here.
>
> Warren
>
> -
>
> ==
> NEP 32 — Remove the financial functions from NumPy
> ==
>
> :Author: Warren Weckesser 
> :Status: Draft
> :Type: Standards Track
> :Created: 2019-08-30
>
>
> Abstract
> 
>
> We propose deprecating and ultimately removing the financial functions [1]_
> from NumPy.  The functions will be moved to an independent repository,
> and provided to the community as a separate package with the name
> ``numpy_financial``.
>
>
> Motivation and scope
> 
>
> The NumPy financial functions [1]_ are the 10 functions ``fv``, ``ipmt``,
> ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and
> ``rate``.
> The functions provide elementary financial calculations such as future
> value,
> net present value, etc. These functions were added to NumPy in 2008 [2]_.
>
> In May, 2009, a request by Joe Harrington to add a function called ``xirr``
> to
> the financial functions triggered a long thread about these functions [3]_.
> One important point that came up in that thread is that a "real" financial
> library must be able to handle real dates.  The NumPy financial functions
> do
> not work with actual dates or calendars.  The preference for a more capable
> library independent of NumPy was expressed several times in that thread.
>
> In June, 2009, D. L. Goldsmith expressed concerns about the correctness of
> the
> implementations of some of the financial functions [4]_.  It was suggested
> then
> to move the financial functions out of NumPy to an independent package.
>
> In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the
> financial
> functions from the top-level namespace to ``numpy.financial``.  He also
> suggested giving the functions better names.  Responses at that time
> included
> the suggestion to deprecate them and move them from NumPy to a separate
> package.  This issue is still open.
>
> Later in 2013 [6]_, it was suggested on the mailing list that these
> functions
> be removed from NumPy.
>
> The arguments for the removal of these functions from NumPy:
>
> * They are too specialized for NumPy.
> * They are not actually useful for "real world" financial calculations,
> because
>   they do not handle real dates and calendars.
> * The definition of "correctness" for some of these functions seems to be a
>   matter of convention, and the current NumPy developers do not have the
>   background to judge their correctness.
> * There has been little interest among past and present NumPy developers
>   in maintaining these functions.
>
> The main arguments for keeping the functions in NumPy are:
>
> * Removing these functions will be disruptive for some users.  Current
> users
>   will have to add the new ``numpy_financial`` package to their
> dependencies,
>   and then modify their code to use the new package.
> * The functions provided, while not "industrial strength", are apparently
>   similar to functions provided by spreadsheets and some calculators.
> Having
>   them available in NumPy makes it easier for some developers to migrate
> their
>   software to Python and NumPy.
>
> It is clear from comments in the mailing list discussions and in the GitHub
> issues that many current NumPy developers believe the benefits of removing
> the functions outweigh the costs.  For example, from [5]_::
>
> The financi

Re: [Numpy-discussion] Code review for adding axis argument to permutation and shuffle function

2019-09-13 Thread Warren Weckesser
On 7/4/19, Kexuan Sun  wrote:
> Hi,
>
> I would like to request a code review. The random.permutation and
> random.shuffle functions now can only shuffle along the first axis of a
> multi-dimensional array. I propose to add an axis argument for the
> functions and allow them to shuffle along a given axis. Here is the link
> to the PR (https://github.com/numpy/numpy/pull/13829).


Given the current semantics of 'shuffle', the proposed change makes
sense.  However, I would like to call attention to

https://github.com/numpy/numpy/issues/5173

and to the mailing list thread from 2014 that I started here:

https://mail.python.org/pipermail/numpy-discussion/2014-October/071340.html

The topic of those discussions was that the current behavior of
'shuffle' is often *not* what users want or expect.  What is often
desired is to shuffle each row (or column, or whatever dimension is
specified) *independently* of the others.  So if

 a = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]),

then randomly shuffling 'a' along axis=1 should shuffle each row
independently of the others, to create something like

 a = np.array([[2, 4, 0, 3, 1], [8, 6, 9, 7, 5], [11, 12, 10, 14, 13]])

An API for this was discussed (and of course that ran into the second
of the two hard problems in computer science, naming things).

Take a look at those discussions, and check that

https://github.com/numpy/numpy/pull/13829

fits in with the possible changes mentioned in those discussions.  If
we don't use the name 'shuffle' for the new random permutation
function(s), then the change in PR 13829 is a good one.  However, if
we want to try to reuse the name 'shuffle' to also allow independent
shuffling along an axis, then we have to be careful with how we
interpret the 'axis' argument.


Warren


>
> Thanks!
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Proposal to accept NEP 32: Remove the financial functions from NumPy

2019-09-19 Thread Warren Weckesser
NEP 32 is available at
https://numpy.org/neps/nep-0032-remove-financial-functions.html

Recent timeline:

   - 30-Aug-2019 - A pull request with NEP 32 submitted.
   - 03-Sep-2019 - Announcement of the NEP 32 pull request on the
   NumPy-Discussion mailing list, with the text of the NEP included in the
   email.
   - 08-Sep-2019 - NEP 32 announced on the PyData mailing list (not
   standard procedure, but suggested in a response to the email in
   NumPy-Discussion).
   - 09-Sep-2019 - NEP 32 pull request merged.
   - 11-Sep-2019 - Emails sent to the NumPy-Discussion and PyData mailing
   lists with links to the online version of the NEP.

Only one user (speaking for a group of 12 or so) expressed a preference for
keeping the functions in NumPy, and that user acknowledged "Probably not a
huge inconvenience if we would have to use another library". (The NEP
includes a plan to provide an alternative package for the functions.)
 Several other users were in favor of removing them.  Among the current
NumPy developers who have expressed an opinion, all are in favor of
removing the functions.

There have been no additional email responses since the reminder was sent
on September 11.

In accordance with NEP 0, I propose that the status of NEP 32 be changed to
*Accepted*. If there are no substantive objections within 7 days from this
email, then the NEP will be accepted; see NEP 0 for more details (
https://numpy.org/neps/nep-.html#how-a-nep-becomes-accepted).


Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DType Roadmap/NEP Discussion

2019-09-19 Thread Warren Weckesser
On 9/18/19, Sebastian Berg  wrote:
> Hi all,
>
> to try and make some progress towards a decision since the broad design
> is pretty much settling from my side. I am thinking about making a
> meeting, and suggest Monday at 11am Pacific Time (I am open to other
> times though).


That works for me.

Warren


>
> My hope is to get everyone interested on board, so that we can make an
> informed decision about the general direction very soon. So just reach
> out, or discuss on the mailing list as well.
>
> The current draft for an NEP is here:
> https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
>
> There are some design goals that I would like to clear up. I would
> prefer to avoid deep discussions of some specific issues, since I think
> the important decision right now is that my general start is in the
> right direction.
>
> It is not an easy topic, so my plan would be try and briefly summarize
> that and then hopefully clarify any questions and then we can discuss
> why alternatives are rejected. The most important thing is maybe
> gathering concerns which need to be clarified before we can go towards
> accepting the general design ideas.
>
> The main point of the NEP draft is actually captured by the picture in
> the linked document: DTypes are classes (such as Float64) and what is
> attached to the array is an instance of that class " ">float64". Additionally, we would have AbstractDType classes which
> cannot be instantiated but define a type hierarchy.
>
> To list the main points:
>
> * DTypes are classes (corresponding to the current type number)
>
> * `arr.dtype` is an instances of its class, allowing to store
>   additional information such as a physical unit, the string length.
>
> * Most things are defined in special dtype slots similar to Pythons
>   type and number slots. They will be hidden and can be set through
>   an init function similar to `PyType_FromSpec` [1].
>
> * Promotion is defined primarily on the DType classes
>
> * Casting from one DType to another DType is defined by a new
>   CastingImpl object (should become a special ufunc)
> - e.g. for strings, the CastingImpl is in charge of finding the
>   correct string length
>
> * The AbstractDType hierarchy will be used to decide the signature when
>   calling UFuncs.
>
>
> The main iffier points I can think of are:
>
> * NumPy currently uses value based promotion in some cases, which
>   requires special AbstractDTypes to describe (and some legacy
>   paths). (They are used use more like instances than typical classes)
>
> * Casting between flexible dtypes (such as strings) is a multi-step
>   process to figure out the actual output dtype.
> - An example is: `np.can_cast("float64", "S3")` first finding
>   that `Float64->String` is possible in principle and then
>   asking the CastingImpl to find that `float64->S3` is not.
>
> * We have to break ABI compatibility in very minor, back-portable
>   way. More smaller incompatibilities are likely [2].
>
> * Since it is a major redesign, a lot of code has to be added/touched,
>   although it is possible to channel much of it back into the old
>   machinery.
>
> * A largish amount of new API around new DType type objects and also
>   DTypeMeta type objects, which users can (although usually do not have
>   to) subclass.
>
> However, most other designs will have similar issues. Basically, I
> currently really think this is "right", even if some details may end up
> a tricky.
>
> Best,
>
> Sebastian
>
>
> PS: The one thing outside the more general list above that I may want
> to discuss is how acceptable a global dict/mapping for dtype discovery
> during `np.array` coercion is (mapping python type -> dtype)...
>
>
> [1] https://docs.python.org/3/c-api/type.html#c.PyType_FromSpec
> [2] One possible issue may be "S0" which is normally used to denote
> what in the new API would be the `String` DType class.
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [pydata] NumPy proposal to remove the financial functions.

2019-09-22 Thread Warren Weckesser
On 9/21/19, Brendan Barnwell  wrote:
> Hi Warren,
>
> I'm somewhat late to this discussion but I too have used the financial
> functions.  I looked at the discussion and the NEP and one thing I don't
> understand is how the maintenance burden is alleviated if the functions are
>
> moved to a separate library.  Is the intent of the Numpy devs to just
> "dump" these functions into numpy_financial and then not maintain them?  If
>
> not, what is achieved by moving them out of numpy?


Brendan,

There have been some more recent comments on the github issue that are
relevant; take a look:

https://github.com/numpy/numpy/issues/2880

It is true that when the functions are moved to numpy_financial, they
will receive less attention from the core NumPy developers.  Indeed,
that is the point of the move.  As you can see from the comments in
the github issue and those quoted in the NEP, there is no interest
among the current developers in maintaining these functions in NumPy.

By having a smaller and more focused library that is explicitly for
financial functions, it is possible that new developers with greater
interest and expertise in that domain will be motivated to contribute.
See, for example, Graham Duncan's recent comments in the github issue.
It remains to be seen whether we'll end up with a significantly
*better* library for financial calculations once the transition is
complete.

For the most visibility among the NumPy developers, it would be best
to continue the conversation in a NumPy venue, either the github issue
or the NumPy mailing list.  I've cc'ed this email to the NumPy mailing
list.

Warren


>
> On Thursday, September 19, 2019 at 8:25:52 AM UTC-7, Warren Weckesser
> wrote:
>>
>> On 9/8/19, Warren Weckesser > wrote:
>> > NumPy is considering a NEP (NumPy Enhancement Proposal) that proposes
>> the
>> > deprecation and ultimate removal of the financial functions from NumPy.
>> >
>> > The functions would be moved to an independent library.  The mailing
>> list
>> > discussion of this proposal is at
>> >
>> >
>> >
>> http://numpy-discussion.10968.n7.nabble.com/NEP-32-Remove-the-financial-functions-from-NumPy-tt47456.html
>>
>> >
>> > or
>> >
>> >
>> >
>> https://mail.python.org/pipermail/numpy-discussion/2019-September/079965.html
>>
>> >
>> > The first message in that thread includes the proposed NEP.
>> >
>> > There have been a couple suggestions to ask about this on the Pandas
>> > mailing list.  Contributions to the thread in the numpy-discussion
>> mailing
>> > list would be appreciated!
>>
>>
>> FYI:  The proposal to accept the NEP to remove the financial functions
>> has been made on the NumPy-Discussion mailing list:
>>
>> https://mail.python.org/pipermail/numpy-discussion/2019-September/080074.html
>>
>>
>> Warren
>>
>> >
>> > Thanks,
>> >
>> > Warren
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "PyData" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to pyd...@googlegroups.com .
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/pydata/8458a06f-ead7-4f7e-b288-4a15a6002482%40googlegroups.com.
>>
>>
>> >
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pydata+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pydata/8983e694-7067-44cb-a35f-5c173d44c160%40googlegroups.com.
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NEP 32 is accepted. Now the work begins...

2019-09-27 Thread Warren Weckesser
NumPy devs,

NEP 32 to remove the financial functions
(https://numpy.org/neps/nep-0032-remove-financial-functions.html) has
been accepted.  The next step is to create the numpy-financial package
that will replace them.  The repository for the new package is
https://github.com/numpy/numpy-financial.

I have a work-in-progress pull request there to get the initial
structure set up.  Reviews of the PR would be helpful, as would
contributions to set up Sphinx-based documentation, continuous
integration, PyPI packaging, and anything else that goes into setting
up a "proper" package.   Any help would be greatly appreciated!


Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 32 is accepted. Now the work begins...

2019-09-28 Thread Warren Weckesser
On 9/27/19, Warren Weckesser  wrote:
> NumPy devs,
>
> NEP 32 to remove the financial functions
> (https://numpy.org/neps/nep-0032-remove-financial-functions.html) has
> been accepted.


CI gurus: the web page containing the rendered NEPs,
https://numpy.org/neps/, has not updated since the pull request that
changed the status of NEP 32 to Accepted was merged
(https://github.com/numpy/numpy/pull/14600).  Does something else need
to be done to get that page to regenerate?

Warren


  The next step is to create the numpy-financial package
> that will replace them.  The repository for the new package is
> https://github.com/numpy/numpy-financial.
>
> I have a work-in-progress pull request there to get the initial
> structure set up.  Reviews of the PR would be helpful, as would
> contributions to set up Sphinx-based documentation, continuous
> integration, PyPI packaging, and anything else that goes into setting
> up a "proper" package.   Any help would be greatly appreciated!
>
>
> Warren
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Forcing gufunc to error with size zero input

2019-09-28 Thread Warren Weckesser
I'm experimenting with gufuncs, and I just created a simple one with
signature '(i)->()'.  Is there a way to configure the gufunc itself so
that an empty array results in an error?  Or would I have to create a
Python wrapper around the gufunc that does the error checking?
Currently, when passed an empty array, the ufunc loop is called with
the core dimension associated with i set to 0.  It would be nice if
the code didn't get that far, and the ufunc machinery "knew" that this
gufunc didn't accept a core dimension that is 0.  I'd like to
automatically get an error, something like the error produced by
`np.max([])`.

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 32 is accepted. Now the work begins...

2019-09-28 Thread Warren Weckesser
On 9/28/19, Sebastian Berg  wrote:
> On Sat, 2019-09-28 at 13:15 -0400, Warren Weckesser wrote:
>> On 9/27/19, Warren Weckesser  wrote:
>> > NumPy devs,
>> >
>> > NEP 32 to remove the financial functions
>> > (https://numpy.org/neps/nep-0032-remove-financial-functions.html)
>> > has
>> > been accepted.
>>
>> CI gurus: the web page containing the rendered NEPs,
>> https://numpy.org/neps/, has not updated since the pull request that
>> changed the status of NEP 32 to Accepted was merged
>> (https://github.com/numpy/numpy/pull/14600).  Does something else
>> need
>> to be done to get that page to regenerate?
>>
>
> I pushed an empty commit to trigger deployment. That should happen
> automatically (as it does OK for the devdocs). I do not know why it
> does not work, and github did not yet answer my service request on it.
>


Thanks Sebastian.  The NEPs web page is updated now.

Warren


> - Sebastian
>
>
>> Warren
>>
>>
>>   The next step is to create the numpy-financial package
>> > that will replace them.  The repository for the new package is
>> > https://github.com/numpy/numpy-financial.
>> >
>> > I have a work-in-progress pull request there to get the initial
>> > structure set up.  Reviews of the PR would be helpful, as would
>> > contributions to set up Sphinx-based documentation, continuous
>> > integration, PyPI packaging, and anything else that goes into
>> > setting
>> > up a "proper" package.   Any help would be greatly appreciated!
>> >
>> >
>> > Warren
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Forcing gufunc to error with size zero input

2019-09-28 Thread Warren Weckesser
On 9/28/19, Eric Wieser  wrote:
> Can you just raise an exception in the gufuncs inner loop? Or is there no
> mechanism to do that today?

Maybe?  I don't know what is the idiomatic way to handle errors
detected in an inner loop.  And pushing this particular error
detection into the inner loop doesn't feel right.


>
> I don't think you were proposing that core dimensions should _never_ be
> allowed to be 0,


No, I'm not suggesting that.  There are many cases where a length 0
core dimension is fine.

I'm interested in the case where there is not a meaningful definition
of the operation on the empty set.  The mean is an example.  Currently
`np.mean([])` generates two warnings (one useful, the other cryptic
and apparently incidental), and returns nan.  Returning nan is one way
to handle such a case; another is to raise an error like `np.amax([])`
does.  I'd like to raise an error in the example that I'm working on
('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The
function is a gufunc, not a reduction of a binary operation, so the
'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
effect.

> but if you were I disagree. I spent a fair amount of work
> enabling that for linalg because it provided some convenient base cases.
>
> We could go down the route of augmenting the gufuncs signature syntax to
> support requiring non-empty dimensions, like we did for optional ones -
> although IMO we should consider switching from a string minilanguage to a
> structured object specification if we plan to go too much further with
> extending it.

After only a quick glance at that code: one option is to add a '+'
after the input names in the signature that must have a length that is
at least 1.  So the signature for functions like `mean` (if you were
to reimplement it as a gufunc, and wanted an error instead of nan),
`amax`, `ptp`, etc, would be '(i+)->()'.

However, the only meaningful uses-cases of this enhancement that I've
come up with are these simple reductions.  So I don't know if making
such a change to the signature is worthwhile.  On the other hand,
there are many examples of useful 1-d reductions that aren't the
reduction of an associative binary operation.  It might be worthwhile
to have a new convenience function just for the case '(i)->()', maybe
something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
ugly, but I think you get the idea), and that function can have an
argument to specify that the length must be at least 1.

I'll see if that is feasible, but I won't be surprised to learn that
there are good reasons for *not* doing that.

Warren



>
> On Sat, Sep 28, 2019, 17:47 Warren Weckesser 
> wrote:
>
>> I'm experimenting with gufuncs, and I just created a simple one with
>> signature '(i)->()'.  Is there a way to configure the gufunc itself so
>> that an empty array results in an error?  Or would I have to create a
>> Python wrapper around the gufunc that does the error checking?
>> Currently, when passed an empty array, the ufunc loop is called with
>> the core dimension associated with i set to 0.  It would be nice if
>> the code didn't get that far, and the ufunc machinery "knew" that this
>> gufunc didn't accept a core dimension that is 0.  I'd like to
>> automatically get an error, something like the error produced by
>> `np.max([])`.
>>
>> Warren
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Forcing gufunc to error with size zero input

2019-09-28 Thread Warren Weckesser
On 9/29/19, Warren Weckesser  wrote:
> On 9/28/19, Eric Wieser  wrote:
>> Can you just raise an exception in the gufuncs inner loop? Or is there no
>> mechanism to do that today?
>
> Maybe?  I don't know what is the idiomatic way to handle errors
> detected in an inner loop.  And pushing this particular error
> detection into the inner loop doesn't feel right.
>
>
>>
>> I don't think you were proposing that core dimensions should _never_ be
>> allowed to be 0,
>
>
> No, I'm not suggesting that.  There are many cases where a length 0
> core dimension is fine.
>
> I'm interested in the case where there is not a meaningful definition
> of the operation on the empty set.  The mean is an example.  Currently
> `np.mean([])` generates two warnings (one useful, the other cryptic
> and apparently incidental), and returns nan.  Returning nan is one way
> to handle such a case; another is to raise an error like `np.amax([])`
> does.  I'd like to raise an error in the example that I'm working on
> ('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The
> function is a gufunc, not a reduction of a binary operation, so the
> 'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
> effect.
>
>> but if you were I disagree. I spent a fair amount of work
>> enabling that for linalg because it provided some convenient base cases.
>>
>> We could go down the route of augmenting the gufuncs signature syntax to
>> support requiring non-empty dimensions, like we did for optional ones -
>> although IMO we should consider switching from a string minilanguage to a
>> structured object specification if we plan to go too much further with
>> extending it.
>
> After only a quick glance at that code: one option is to add a '+'
> after the input names in the signature that must have a length that is
> at least 1.  So the signature for functions like `mean` (if you were
> to reimplement it as a gufunc, and wanted an error instead of nan),
> `amax`, `ptp`, etc, would be '(i+)->()'.
>
> However, the only meaningful uses-cases of this enhancement that I've
> come up with are these simple reductions.


Of course, just minutes after sending the email, I realized I *do*
know of other signatures that could benefit from a check on the core
dimension size.  An implementation of Pearson's correlation
coefficient as a gufunc would have signature (i),(i)->(), and the core
dimension i must be at least *2* for the calculation to be well
defined.  Other correlations would also likely require a nonzero core
dimension.

Warren



>  So I don't know if making
> such a change to the signature is worthwhile.  On the other hand,
> there are many examples of useful 1-d reductions that aren't the
> reduction of an associative binary operation.  It might be worthwhile
> to have a new convenience function just for the case '(i)->()', maybe
> something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
> ugly, but I think you get the idea), and that function can have an
> argument to specify that the length must be at least 1.
>
> I'll see if that is feasible, but I won't be surprised to learn that
> there are good reasons for *not* doing that.
>
> Warren
>
>
>
>>
>> On Sat, Sep 28, 2019, 17:47 Warren Weckesser 
>> wrote:
>>
>>> I'm experimenting with gufuncs, and I just created a simple one with
>>> signature '(i)->()'.  Is there a way to configure the gufunc itself so
>>> that an empty array results in an error?  Or would I have to create a
>>> Python wrapper around the gufunc that does the error checking?
>>> Currently, when passed an empty array, the ufunc loop is called with
>>> the core dimension associated with i set to 0.  It would be nice if
>>> the code didn't get that far, and the ufunc machinery "knew" that this
>>> gufunc didn't accept a core dimension that is 0.  I'd like to
>>> automatically get an error, something like the error produced by
>>> `np.max([])`.
>>>
>>> Warren
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Error handling in a ufunc inner loop.

2019-09-29 Thread Warren Weckesser
This is a new thread to address the question of error handling in a ufunc
loop that was brought up in the thread on handling core dimensions of
length zero.  I'm attempting to answer my own question about the idiomatic
way to handle an error in an inner loop.

The use of the GIL with a ufunc loop is documented at


https://numpy.org/devdocs/reference/internals.code-explanations.html#function-call

So an inner loop is running without the GIL if the macro NPY_ALLOW_THREADS
is defined and the loop is not an object-type loop.

If the inner loop is running without the GIL, it must acquire the GIL
before calling, say, PyErr_SetString to set an exception.  The NumPy macros
for acquiring the GIL are documented at

https://docs.scipy.org/doc/numpy/reference/c-api.array.html#group-2

These macros are defined in numpy/core/include/numpy/ndarraytypes.h.  If
NPY_ALLOW_THREADS is defined, these macros wrap calls to
PyGILState_Ensure() and PyGILState_Release() (
https://docs.python.org/3/c-api/init.html#non-python-created-threads):

```
#define NPY_ALLOW_C_API_DEF  PyGILState_STATE __save__;
#define NPY_ALLOW_C_API  do {__save__ = PyGILState_Ensure();} while (0);
#define NPY_DISABLE_C_APIdo {PyGILState_Release(__save__);} while (0);
```

If NPY_ALLOW_THREADS is not defined, those macros are defined with empty
values.

Now suppose I want to change the following inner loop to set an exception
instead of returning nan when the input is negative:

```
static void
logfactorial_loop(char **args, npy_intp *dimensions,
  npy_intp* steps, void* data)
{
char *in = args[0];
char *out = args[1];
npy_intp in_step = steps[0];
npy_intp out_step = steps[1];

for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out +=
out_step) {
int64_t x = *(int64_t *)in;
if (x < 0) {
*((double *)out) = NAN;
}
else {
*((double *)out) = logfactorial(x);
}
}
}
```

Based on the documentation linked above, the changed inner loop is simply:

```
static void
logfactorial_loop(char **args, npy_intp *dimensions,
  npy_intp* steps, void* data)
{
char *in = args[0];
char *out = args[1];
npy_intp in_step = steps[0];
npy_intp out_step = steps[1];

for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out +=
out_step) {
int64_t x = *(int64_t *)in;
if (x < 0) {
NPY_ALLOW_C_API_DEF
NPY_ALLOW_C_API
PyErr_SetString(PyExc_ValueError, "math domain error in
logfactorial: x < 0");
NPY_DISABLE_C_API
return;
}
else {
*((double *)out) = logfactorial(x);
}
}
}
```

That worked as expected, but I haven't tried it yet with a NumPy
installation where NPY_ALLOW_THREADS is not defined.

Is that change correct?  Would that be considered the (or an) idiomatic way
to handle errors in an inner loop?  Are there any potential problems that
I'm missing?

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Forcing gufunc to error with size zero input

2019-09-29 Thread Warren Weckesser
On 9/29/19, Warren Weckesser  wrote:
> On 9/28/19, Eric Wieser  wrote:
>> Can you just raise an exception in the gufuncs inner loop? Or is there no
>> mechanism to do that today?
>
> Maybe?  I don't know what is the idiomatic way to handle errors
> detected in an inner loop.  And pushing this particular error
> detection into the inner loop doesn't feel right.
>
>
>>
>> I don't think you were proposing that core dimensions should _never_ be
>> allowed to be 0,
>
>
> No, I'm not suggesting that.  There are many cases where a length 0
> core dimension is fine.
>
> I'm interested in the case where there is not a meaningful definition
> of the operation on the empty set.  The mean is an example.  Currently
> `np.mean([])` generates two warnings (one useful, the other cryptic
> and apparently incidental), and returns nan.  Returning nan is one way
> to handle such a case; another is to raise an error like `np.amax([])`
> does.  I'd like to raise an error in the example that I'm working on
> ('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The


FYI: I renamed that repository to 'ufunclab':
https://github.com/WarrenWeckesser/ufunclab

Warren


> function is a gufunc, not a reduction of a binary operation, so the
> 'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
> effect.
>
>> but if you were I disagree. I spent a fair amount of work
>> enabling that for linalg because it provided some convenient base cases.
>>
>> We could go down the route of augmenting the gufuncs signature syntax to
>> support requiring non-empty dimensions, like we did for optional ones -
>> although IMO we should consider switching from a string minilanguage to a
>> structured object specification if we plan to go too much further with
>> extending it.
>
> After only a quick glance at that code: one option is to add a '+'
> after the input names in the signature that must have a length that is
> at least 1.  So the signature for functions like `mean` (if you were
> to reimplement it as a gufunc, and wanted an error instead of nan),
> `amax`, `ptp`, etc, would be '(i+)->()'.
>
> However, the only meaningful uses-cases of this enhancement that I've
> come up with are these simple reductions.  So I don't know if making
> such a change to the signature is worthwhile.  On the other hand,
> there are many examples of useful 1-d reductions that aren't the
> reduction of an associative binary operation.  It might be worthwhile
> to have a new convenience function just for the case '(i)->()', maybe
> something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
> ugly, but I think you get the idea), and that function can have an
> argument to specify that the length must be at least 1.
>
> I'll see if that is feasible, but I won't be surprised to learn that
> there are good reasons for *not* doing that.
>
> Warren
>
>
>
>>
>> On Sat, Sep 28, 2019, 17:47 Warren Weckesser 
>> wrote:
>>
>>> I'm experimenting with gufuncs, and I just created a simple one with
>>> signature '(i)->()'.  Is there a way to configure the gufunc itself so
>>> that an empty array results in an error?  Or would I have to create a
>>> Python wrapper around the gufunc that does the error checking?
>>> Currently, when passed an empty array, the ufunc loop is called with
>>> the core dimension associated with i set to 0.  It would be nice if
>>> the code didn't get that far, and the ufunc machinery "knew" that this
>>> gufunc didn't accept a core dimension that is 0.  I'd like to
>>> automatically get an error, something like the error produced by
>>> `np.max([])`.
>>>
>>> Warren
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Error handling in a ufunc inner loop.

2019-09-29 Thread Warren Weckesser
On 9/29/19, Warren Weckesser  wrote:
> This is a new thread to address the question of error handling in a ufunc
> loop that was brought up in the thread on handling core dimensions of
> length zero.  I'm attempting to answer my own question about the idiomatic
> way to handle an error in an inner loop.
>
> The use of the GIL with a ufunc loop is documented at
>
>
> https://numpy.org/devdocs/reference/internals.code-explanations.html#function-call
>
> So an inner loop is running without the GIL if the macro NPY_ALLOW_THREADS
> is defined and the loop is not an object-type loop.
>
> If the inner loop is running without the GIL, it must acquire the GIL
> before calling, say, PyErr_SetString to set an exception.  The NumPy macros
> for acquiring the GIL are documented at
>
> https://docs.scipy.org/doc/numpy/reference/c-api.array.html#group-2
>
> These macros are defined in numpy/core/include/numpy/ndarraytypes.h.  If
> NPY_ALLOW_THREADS is defined, these macros wrap calls to
> PyGILState_Ensure() and PyGILState_Release() (
> https://docs.python.org/3/c-api/init.html#non-python-created-threads):
>
> ```
> #define NPY_ALLOW_C_API_DEF  PyGILState_STATE __save__;
> #define NPY_ALLOW_C_API  do {__save__ = PyGILState_Ensure();} while
> (0);
> #define NPY_DISABLE_C_APIdo {PyGILState_Release(__save__);} while (0);
> ```
>
> If NPY_ALLOW_THREADS is not defined, those macros are defined with empty
> values.
>
> Now suppose I want to change the following inner loop to set an exception
> instead of returning nan when the input is negative:
>
> ```
> static void
> logfactorial_loop(char **args, npy_intp *dimensions,
>   npy_intp* steps, void* data)
> {
> char *in = args[0];
> char *out = args[1];
> npy_intp in_step = steps[0];
> npy_intp out_step = steps[1];
>
> for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out +=
> out_step) {
> int64_t x = *(int64_t *)in;
> if (x < 0) {
> *((double *)out) = NAN;
> }
> else {
> *((double *)out) = logfactorial(x);
> }
> }
> }
> ```
>
> Based on the documentation linked above, the changed inner loop is simply:
>
> ```
> static void
> logfactorial_loop(char **args, npy_intp *dimensions,
>   npy_intp* steps, void* data)
> {
> char *in = args[0];
> char *out = args[1];
> npy_intp in_step = steps[0];
> npy_intp out_step = steps[1];
>
> for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out +=
> out_step) {
> int64_t x = *(int64_t *)in;
> if (x < 0) {
> NPY_ALLOW_C_API_DEF
> NPY_ALLOW_C_API
> PyErr_SetString(PyExc_ValueError, "math domain error in
> logfactorial: x < 0");
> NPY_DISABLE_C_API
> return;
> }
> else {
> *((double *)out) = logfactorial(x);
> }
> }
> }
> ```
>
> That worked as expected, but I haven't tried it yet with a NumPy
> installation where NPY_ALLOW_THREADS is not defined.
>
> Is that change correct?  Would that be considered the (or an) idiomatic way
> to handle errors in an inner loop?  Are there any potential problems that
> I'm missing?


Sebastian Berg pointed out to me that exactly this pattern is used in
NumPy, for example,


https://github.com/numpy/numpy/blob/68bd6e359a6b0863acf39cad637e1444d78eabd0/numpy/core/src/umath/loops.c.src#L913

So I'll take that as a yes, that's the way (or at least a way) to do it.

Warren


>
> Warren
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] State of numpy-financial

2019-11-15 Thread Warren Weckesser
On 11/15/19, Marcelo Gasparian Gosling  wrote:
> Hi everyone!
>
> So, np-financial is being phased out of mainline Numpy, right? I have a
> patch for the IRR function that I'd like to submit, is it just as easy as
> making a PR on Github?


Hi Marcelo,

The new home for the NumPy financial functions is
https://github.com/numpy/numpy-financial

Pull requests are welcome!

Warren


>
> Cheers,
>
> Marcelo
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy/windows-wheel-builder repository

2019-12-03 Thread Warren Weckesser
It looks like the repo https://github.com/numpy/windows-wheel-builder
is defunct.  Could someone with the appropriate access privileges
merge Matti's pull request to update README.rst
(https://github.com/numpy/windows-wheel-builder/pull/7), and add a
one-line description to the main repo page that says something like
"INACTIVE - this repo is no longer maintained"?

Thanks,

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] ANN: SciPy 1.4.0

2019-12-16 Thread Warren Weckesser
otation` methods ``from_dcm``, ``as_dcm`` were renamed
> to
> ``from_matrix``, ``as_matrix`` respectively. The old names will be removed
> in
> SciPy 1.6.0.
>
> Method ``Rotation.match_vectors`` was deprecated in favor of
> ``Rotation.align_vectors``, which provides a more logical and
> general API to the same functionality. The old method
> will be removed in SciPy 1.6.0.
>
> Backwards incompatible changes
> ==
>
> `scipy.special` changes
> -
> The deprecated functions ``hyp2f0``, ``hyp1f2``, and ``hyp3f0`` have been
> removed.
>
> The deprecated function ``bessel_diff_formula`` has been removed.
>
> The function ``i0`` is no longer registered with ``numpy.dual``, so that
> ``numpy.dual.i0`` will unconditionally refer to the NumPy version
> regardless
> of whether `scipy.special` is imported.
>
> The function ``expn`` has been changed to return ``nan`` outside of its
> domain of definition (``x, n < 0``) instead of ``inf``.
>
> `scipy.sparse` changes
> 
> Sparse matrix reshape now raises an error if shape is not two-dimensional,
> rather than guessing what was meant. The behavior is now the same as before
> SciPy 1.1.0.
>
> ``CSR`` and ``CSC`` sparse matrix classes should now return empty matrices
> of the same type when indexed out of bounds. Previously, for some versions
> of SciPy, this would raise an ``IndexError``. The change is largely
> motivated
> by greater consistency with ``ndarray`` and ``numpy.matrix`` semantics.
>
> `scipy.signal` changes
> ---
> `scipy.signal.resample` behavior for length-1 signal inputs has been
> fixed to output a constant (DC) value rather than an impulse, consistent
> with
> the assumption of signal periodicity in the FFT method.
>
> `scipy.signal.cwt` now performs complex conjugation and time-reversal of
> wavelet data, which is a backwards-incompatible bugfix for
> time-asymmetric wavelets.
>
> `scipy.stats` changes
> --
> `scipy.stats.loguniform` added with better documentation as (an alias for
> ``scipy.stats.reciprocal``). ``loguniform`` generates random variables
> that are equally likely in the log space; e.g., ``1``, ``10`` and ``100``
> are all equally likely if ``loguniform(10 ** 0, 10 ** 2).rvs()`` is used.
>
>
> Other changes
> =
> The ``LSODA`` method of `scipy.integrate.solve_ivp` now correctly detects
> stiff
> problems.
>
> `scipy.spatial.cKDTree` now accepts and correctly handles empty input data
>
> `scipy.stats.binned_statistic_dd` now calculates the standard deviation
> statistic in a numerically stable way.
>
> `scipy.stats.binned_statistic_dd` now throws an error if the input data
> contains either ``np.nan`` or ``np.inf``. Similarly, in `scipy.stats` now
> all
> continuous distributions' ``.fit()`` methods throw an error if the input
> data
> contain any instance of either ``np.nan`` or ``np.inf``.
>
>
> Authors
> ===
>
> * @endolith
> * @wenhui-prudencemed +
> * Abhinav +
> * Anne Archibald
> * ashwinpathak20nov1996 +
> * Danilo Augusto +
> * Nelson Auner +
> * aypiggott +
> * Christoph Baumgarten
> * Peter Bell
> * Sebastian Berg
> * Arman Bilge +
> * Benedikt Boecking +
> * Christoph Boeddeker +
> * Daniel Bunting
> * Evgeni Burovski
> * Angeline Burrell +
> * Angeline G. Burrell +
> * CJ Carey
> * Carlos Ramos Carreño +
> * Mak Sze Chun +
> * Malayaja Chutani +
> * Christian Clauss +
> * Jonathan Conroy +
> * Stephen P Cook +
> * Dylan Cutler +
> * Anirudh Dagar +
> * Aidan Dang +
> * dankleeman +
> * Brandon David +
> * Tyler Dawson +
> * Dieter Werthmüller
> * Joe Driscoll +
> * Jakub Dyczek +
> * Dávid Bodnár
> * Fletcher Easton +
> * Stefan Endres
> * etienne +
> * Johann Faouzi
> * Yu Feng
> * Isuru Fernando +
> * Matthew H Flamm
> * Martin Gauch +
> * Gabriel Gerlero +
> * Ralf Gommers
> * Chris Gorgolewski +
> * Domen Gorjup +
> * Edouard Goudenhoofdt +
> * Jan Gwinner +
> * Maja Gwozdz +
> * Matt Haberland
> * hadshirt +
> * Pierre Haessig +
> * David Hagen
> * Charles Harris
> * Gina Helfrich +
> * Alex Henrie +
> * Francisco J. Hernandez Heras +
> * Andreas Hilboll
> * Lindsey Hiltner
> * Thomas Hisch
> * Min ho Kim +
> * Gert-Ludwig Ingold
> * jakobjakobson13 +
> * Todd Jennings
> * He Jia
> * Muhammad Firmansyah Kasim +
> * Andrew Knyazev +
> * Holger Kohr +
> * Mateusz Konieczny +
> * Krzysztof Pióro +
> * Philipp Lang +
> * Peter Mahler Larsen +
> * Eric Larson
> * Antony Lee
> * Gregory R. Lee
> * Chelsea Li

[Numpy-discussion] NumPy Development Meeting - Triage Focus

2019-12-17 Thread Warren Weckesser
Hi all,

Our bi-weekly triage-focused NumPy development meeting is tomorrow
(Wednesday, December 18) at 11 am Pacific Time. Everyone is invited
to join in and edit the work-in-progress meeting topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

Best regards,

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] ANN: SciPy 1.4.1

2019-12-19 Thread Warren Weckesser
On 12/19/19, Tyler Reddy  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Hi all,
>
> On behalf of the SciPy development team I'm pleased to announce
> the release of SciPy 1.4.1, which is a bug fix release.


Thanks for the quick fix, Tyler!

Warren


>
> Sources and binary wheels can be found at:
> https://pypi.org/project/scipy/
> and at: https://github.com/scipy/scipy/releases/tag/v1.4.1
>
> One of a few ways to install this release with pip:
>
> pip install scipy==1.4.1
>
> ==
> SciPy 1.4.1 Release Notes
> ==
>
> SciPy 1.4.1 is a bug-fix release with no new features
> compared to 1.4.0. Importantly, it aims to fix a problem
> where an older version of pybind11 may cause a segmentation
> fault when imported alongside incompatible libraries.
>
> Authors
> ==
>
> * Ralf Gommers
> * Tyler Reddy
>
> Issues closed for 1.4.1
> -
>
> * `#11237 `__: Seg fault when
> importing torch
>
> Pull requests for 1.4.1
> 
>
> * `#11238 `__: BLD: update
> minimum pybind11 version to 2.4.0.
>
> Checksums
> =
>
> MD5
> ~~~
>
> 82a6df2d23315b9e7f7ab334ae4ed98d
>  scipy-1.4.1-cp35-cp35m-macosx_10_6_intel.whl
> 68a72f96918911586cc3b01566c8719a
> scipy-1.4.1-cp35-cp35m-manylinux1_i686.whl
> 644e69ec76bc34276117aa377df6b56b
>  scipy-1.4.1-cp35-cp35m-manylinux1_x86_64.whl
> 94a4cc9c9b0b9fdfd5159317a34ecf04  scipy-1.4.1-cp35-cp35m-win32.whl
> 00a88c31baa15561b726182b46a90bbf  scipy-1.4.1-cp35-cp35m-win_amd64.whl
> f1ae0ec2394531c043dd66a4d87644ae
>  scipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whl
> f02e63505e14c1c353f01bf5355bdb6b
> scipy-1.4.1-cp36-cp36m-manylinux1_i686.whl
> 200f038910b0f92671d2ff5cb170f51b
>  scipy-1.4.1-cp36-cp36m-manylinux1_x86_64.whl
> 5b2fb317f0105f1b6538a37405d6346e  scipy-1.4.1-cp36-cp36m-win32.whl
> 2820bc38feb01d1d8a161eb07000a5b2  scipy-1.4.1-cp36-cp36m-win_amd64.whl
> a26c022bb638cbb105789e9586032cc7
>  scipy-1.4.1-cp37-cp37m-macosx_10_6_intel.whl
> b84878cf6419acbcc6bf9dcce8ed1ff7
> scipy-1.4.1-cp37-cp37m-manylinux1_i686.whl
> b2a9ee8c5ee393f6a52eb387163ad785
>  scipy-1.4.1-cp37-cp37m-manylinux1_x86_64.whl
> 6f1c29d57a33d2cfd2991672543afda9  scipy-1.4.1-cp37-cp37m-win32.whl
> 2d5e0b3953d4e0a141f8897b39fc70c8  scipy-1.4.1-cp37-cp37m-win_amd64.whl
> 5fedfcb8736f41938681c8e7ef5737b8
>  scipy-1.4.1-cp38-cp38-macosx_10_9_x86_64.whl
> 19ae0bc89a8a88045bfdcdac8eba300a  scipy-1.4.1-cp38-cp38-manylinux1_i686.whl
> 6ab0a834e656cd7314cfe28392fcebb4
>  scipy-1.4.1-cp38-cp38-manylinux1_x86_64.whl
> f8fd48b50c20fbc56e4af6c418b6c239  scipy-1.4.1-cp38-cp38-win32.whl
> 10b3e0755feb71100ed7a0a7c06ed69c  scipy-1.4.1-cp38-cp38-win_amd64.whl
> 3a97689656f33f67614000459ec08585  scipy-1.4.1.tar.gz
> 27608d42755c1acb097c7ab3616aafe0  scipy-1.4.1.tar.xz
> 2586c8563cd6693161e13a0ad6fffe06  scipy-1.4.1.zip
>
> SHA256
> ~~
>
> c5cac0c0387272ee0e789e94a570ac51deb01c796b37fb2aad1fb13f85e2f97d
>  scipy-1.4.1-cp35-cp35m-macosx_10_6_intel.whl
> a144811318853a23d32a07bc7fd5561ff0cac5da643d96ed94a4ffe967d89672
>  scipy-1.4.1-cp35-cp35m-manylinux1_i686.whl
> 71eb180f22c49066f25d6df16f8709f215723317cc951d99e54dc88020ea57be
>  scipy-1.4.1-cp35-cp35m-manylinux1_x86_64.whl
> 770254a280d741dd3436919d47e35712fb081a6ff8bafc0f319382b954b77802
>  scipy-1.4.1-cp35-cp35m-win32.whl
> a1aae70d52d0b074d8121333bc807a485f9f1e6a69742010b33780df2e60cfe0
>  scipy-1.4.1-cp35-cp35m-win_amd64.whl
> bb517872058a1f087c4528e7429b4a44533a902644987e7b2fe35ecc223bc408
>  scipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whl
> dba8306f6da99e37ea08c08fef6e274b5bf8567bb094d1dbe86a20e532aca088
>  scipy-1.4.1-cp36-cp36m-manylinux1_i686.whl
> 386086e2972ed2db17cebf88610aab7d7f6e2c0ca30042dc9a89cf18dcc363fa
>  scipy-1.4.1-cp36-cp36m-manylinux1_x86_64.whl
> 8d3bc3993b8e4be7eade6dcc6fd59a412d96d3a33fa42b0fa45dc9e24495ede9
>  scipy-1.4.1-cp36-cp36m-win32.whl
> dc60bb302f48acf6da8cacfa17d52c63c5415302a9ee77b3b21618090521
>  scipy-1.4.1-cp36-cp36m-win_amd64.whl
> 787cc50cab3020a865640aba3485e9fbd161d4d3b0d03a967df1a2881320512d
>  scipy-1.4.1-cp37-cp37m-macosx_10_6_intel.whl
> 0902a620a381f101e184a958459b36d3ee50f5effd186db76e131cbefcbb96f7
>  scipy-1.4.1-cp37-cp37m-manylinux1_i686.whl
> 00af72998a46c25bdb5824d2b729e7dabec0c765f9deb0b504f928591f5ff9d4
>  scipy-1.4.1-cp37-cp37m-manylinux1_x86_64.whl
> 9508a7c628a165c2c835f2497837bf6ac80eb25291055f56c129df3c943cbaf8
>  scipy-1.4.1-cp37-cp37m-win32.whl
> a2d6df9eb074af7f08866598e4ef068a2b310d98f87dc23bd1b90ec7bdcec802
>  scipy-1.4.1-cp37-cp37m-win_amd64.whl
> 3092857f36b690a321a662fe5496cb816a7f4eecd875e1d36793d92d3f884073
>  scipy-1.4.1-cp38-cp38-macosx_10_9_x86_64.whl
> 8a07760d5c7f3a92e440ad3aedcc98891e915ce857664282ae3c0220f3301eb6
>  scipy-1.4.1-cp38-cp38-manylinux1_i686.whl
> 1e3190466d669d658233e8a583b854f6386dd62d655539b77b3fa25bfb2abb70
>  scipy-1.4.1-cp38-cp38-manylinux1_x86_64.whl
> cc971a82ea1170e

Re: [Numpy-discussion] [SciPy-Dev] NumPy 1.16.6 release.

2019-12-30 Thread Warren Weckesser
Thanks Chuck!

Warren

On 12/29/19, Charles R Harris  wrote:
> Hi All,
>
> On behalf of the NumPy team I am pleased to announce that NumPy 1.16.6 has
> been released. This release fixes bugs reported against the 1.16.5 release
> and backports several enhancements from master that seem appropriate for an
> LTS series. The supported Python versions are 2.7, 3.5-3.7. This is the
> last release planned that supports Python 2.7.  Wheels for this release can
> be downloaded from PyPI <https://pypi.org/project/numpy/1.16.6>, source
> archives and release notes are available from Github
> <https://github.com/numpy/numpy/releases/tag/v1.16.6>. Downstream
> developers building this release should use Cython >= 0.29.2 and, if using
> OpenBLAS, OpenBLAS >= v0.3.7.
>
> *Highlights*
>
>- The ``np.testing.utils`` functions have been updated from 1.19.0-dev0.
>This improves the function documentation and error messages as well
>extending the ``assert_array_compare`` function to additional types.
>
> *Contributors*
>
> A total of 10 people contributed to this release.
>
>
>- CakeWithSteak
>- Charles Harris
>- Chris Burr
>- Eric Wieser
>- Fernando Saravia
>- Lars Grueter
>- Matti Picus
>- Maxwell Aladago
>- Qiming Sun
>- Warren Weckesser
>
>
> *Pull requests merged*
>
> A total of 14 pull requests were merged for this release.
>
>
>- `#14211 <https://github.com/numpy/numpy/pull/14211>`__: BUG: Fix
>uint-overflow if padding with linear_ramp and negative...
>- `#14275 <https://github.com/numpy/numpy/pull/14275>`__: BUG: fixing to
>allow unpickling of PY3 pickles from PY2
>- `#14340 <https://github.com/numpy/numpy/pull/14340>`__: BUG: Fix
>misuse of .names and .fields in various places (backport...
>- `#14423 <https://github.com/numpy/numpy/pull/14423>`__: BUG: test, fix
>regression in converting to ctypes.
>- `#14434 <https://github.com/numpy/numpy/pull/14434>`__: BUG: Fixed
>maximum relative error reporting in assert_allclose
>- `#14509 <https://github.com/numpy/numpy/pull/14509>`__: BUG: Fix
>regression in boolean matmul.
>- `#14686 <https://github.com/numpy/numpy/pull/14686>`__: BUG: properly
>define PyArray_DescrCheck
>- `#14853 <https://github.com/numpy/numpy/pull/14853>`__: BLD: add 'apt
>update' to shippable
>- `#14854 <https://github.com/numpy/numpy/pull/14854>`__: BUG: Fix
>_ctypes class circular reference. (#13808)
>- `#14856 <https://github.com/numpy/numpy/pull/14856>`__: BUG: Fix
>`np.einsum` errors on Power9 Linux and z/Linux
>- `#14863 <https://github.com/numpy/numpy/pull/14863>`__: BLD: Prevent
>-flto from optimising long double representation...
>- `#14864 <https://github.com/numpy/numpy/pull/14864>`__: BUG: lib: Fix
>histogram problem with signed integer arrays.
>- `#15172 <https://github.com/numpy/numpy/pull/15172>`__: ENH: Backport
>improvements to testing functions.
>- `#15191 <https://github.com/numpy/numpy/pull/15191>`__: REL: Prepare
>for 1.16.6 release.
>
> Cheers,
>
> Charles Harris
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Deprecate numpy.dual?

2020-01-03 Thread Warren Weckesser
In response to some work on improving the documentation of `numpy.linalg`
and how it compares to `scipy.linalg`, Kevin Sheppard suggested that the
documentation of the module `numpy.dual` should also be improved.  When I
mentioned this suggestion in the community meeting on December 11, it was
suggested that we should probably deprecate `numpy.dual`.

I think some current NumPy developers (myself included at the time the
topic came up) are unfamiliar with the history and purpose of this module,
so I spent some time reading code and github issues and wrote up some
notes.  These notes are available at

https://github.com/WarrenWeckesser/numpy-notes/blob/master/numpy-dual.md

If you are not familiar with `numpy.dual`, you might find those notes
useful.

Now that I know a bit more about `numpy.dual`, I'm not sure it should be
deprecated.  It provides a hook for other libraries to selectively replace
the use of the exposed functions in internal NumPy code, so if a library
has a better version of, say, `linalg.eigh`, it can configure `numpy.dual`
to use its version. Then, for example, NumPy multivariate normal
distribution code could benefit from the use of that library's version of
`eigh`.

The NumPy documentation of `numpy.dual` refers specifically to SciPy, but
it could be used by any library.  Does anyone know if any other libraries
use `register_func` to put their functions into the `numpy.dual` namespace?

SciPy currently registers some functions, but there is an open issue in
which it is proposed that SciPy no longer register its functions with
`numpy.dual`:

https://github.com/scipy/scipy/issues/10441

This email is to start the discussion of the future of `numpy.dual`.
Some of the options:

1. Quietly continue the status quo.
2. Deprecate `numpy.dual`.
3. Spend time improving the documentation of this feature, and
   perhaps even expand the functions that are supported.

What do you think?  For those who were involved in the creation of
`numpy.dual`: is it working out like you expected?  If not, is it
worthwhile maintaining it?

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecate numpy.dual?

2020-01-03 Thread Warren Weckesser
On 1/3/20, Sebastian Berg  wrote:
> On Fri, 2020-01-03 at 07:11 -0500, Warren Weckesser wrote:
>> In response to some work on improving the documentation of
>> `numpy.linalg` and how it compares to `scipy.linalg`, Kevin Sheppard
>> suggested that the documentation of the module `numpy.dual` should
>> also be improved.  When I mentioned this suggestion in the community
>> meeting on December 11, it was suggested that we should probably
>> deprecate `numpy.dual`.
>>
>> I think some current NumPy developers (myself included at the time
>> the topic came up) are unfamiliar with the history and purpose of
>> this module, so I spent some time reading code and github issues and
>> wrote up some notes.  These notes are available at
>>
>>
>> https://github.com/WarrenWeckesser/numpy-notes/blob/master/numpy-dual.md
>>
>> If you are not familiar with `numpy.dual`, you might find those notes
>> useful.
>>
>> Now that I know a bit more about `numpy.dual`, I'm not sure it should
>> be deprecated.  It provides a hook for other libraries to selectively
>> replace the use of the exposed functions in internal NumPy code, so
>> if a library has a better version of, say, `linalg.eigh`, it can
>> configure `numpy.dual` to use its version. Then, for example, NumPy
>> multivariate normal distribution code could benefit from the use of
>> that library's version of `eigh`.
>
> That is in principle true, but I do not think we use `dual` at all
> internally right now in numpy, and I doubt there is more than a hand
> full uses out there.

In the notes, I listed the internal uses of `numpy.dual` within numpy
that I found:

1. In the code that generates random variates from the multivariate normal
   distribution, one of `svd`, `eigh` or `cholesky` are used from `numpy.dual`.
2. In `matrixlib/defmatrix.py`, the `.I` property of the `matrix` class
   uses either `inv` or `pinv` from `numpy.dual` to compute its value.
3. The window function `numpy.kaiser` uses `numpy.dual.i0`.


>
> Dual is an override mechanism for functionality on ndarrays implemented
> also by numpy.
>
> In either case, I still tend towards deprecation. It seems to have
> issues and the main use case probably was to improve the situation when
> NumPy was compiled without an optimized BLAS/LAPACK. That probably was
> a common problem at some point, but I am not sure it is still an issue.
>
> Overriding functionality with faster implementations is of course a
> valid use-case and maybe `dual` is not a bad solution to the problem
> [0]. But I think we should discuss this more generally with other
> options. IMO deprecating this practically unused thing now does not
> mean we cannot do something similar in the future.

It probably makes sense to have the general discussion before
deprecating `numpy.dual`--there is a (slim?) chance that `numpy.dual`
will turn out to be the best option.

Warren


>
> - Sebastian
>
>
> [0] It has its own namespace, so is opt-in for the end user. You can
> only support a single backend at a time, although I am not sure that
> matters too much. If overrides provide a function to override, it is
> explicit to the end user as to what gets executed as well.
>
>
>> The NumPy documentation of `numpy.dual` refers specifically to SciPy,
>> but it could be used by any library.  Does anyone know if any other
>> libraries use `register_func` to put their functions into the
>> `numpy.dual` namespace?
>>
>> SciPy currently registers some functions, but there is an open issue
>> in which it is proposed that SciPy no longer register its functions
>> with `numpy.dual`:
>>
>> https://github.com/scipy/scipy/issues/10441
>>
>> This email is to start the discussion of the future of `numpy.dual`.
>> Some of the options:
>>
>> 1. Quietly continue the status quo.
>> 2. Deprecate `numpy.dual`.
>> 3. Spend time improving the documentation of this feature, and
>>perhaps even expand the functions that are supported.
>>
>> What do you think?  For those who were involved in the creation of
>> `numpy.dual`: is it working out like you expected?  If not, is it
>> worthwhile maintaining it?
>>
>> Warren
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Is the build-system section of pyproject.toml a maintained list of the build dependencies?

2020-01-08 Thread Warren Weckesser
I'm doing some work on the travis-ci scripts, and I'd like to remove
some redundant calls of 'pip install'.  The scripts should get the
build dependencies from a configuration file instead of having
hard-coded pip commands.  Is pyproject.toml the appropriate file to
use for this?  (Note: we also have test_requirements.txt, but as the
name says, those are the dependencies for running the tests, not for
building numpy.)

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is the build-system section of pyproject.toml a maintained list of the build dependencies?

2020-01-08 Thread Warren Weckesser
On 1/8/20, Warren Weckesser  wrote:
> I'm doing some work on the travis-ci scripts, and I'd like to remove
> some redundant calls of 'pip install'.  The scripts should get the
> build dependencies from a configuration file instead of having
> hard-coded pip commands.  Is pyproject.toml the appropriate file to
> use for this?  (Note: we also have test_requirements.txt, but as the
> name says, those are the dependencies for running the tests, not for
> building numpy.)
>

Updating my question:  `pyproject.toml` lists numpy's build
dependencies in the `build_system` section of the file:

[build-system]
# Minimum requirements for the build system to execute.
requires = [
"setuptools",
"wheel",
"Cython>=0.29.14",  # Note: keep in sync with tools/cythonize.py
]

So the file serves the equivalent purpose of a `requirements.txt`
file.  Is there an option to pip that would allow something like

pip install -r pyproject.toml

(with some other flag or option as needed) to install the build
requirements found in pyproject.toml?  In
https://github.com/numpy/numpy/pull/15275, I wrote a few lines of
Python to get the dependencies from pyproject.toml, but it seems like
that shouldn't be necessary.

Warren


> Warren
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is the build-system section of pyproject.toml a maintained list of the build dependencies?

2020-01-08 Thread Warren Weckesser
On 1/8/20, Kevin Sheppard  wrote:
> With recent versions of pip it will read the pyproject.toml file to get the
> dependencies, and then install these in an isolated environment to build
> the wheel, and then install the wheel.  The requires=[...] in the pyproject
> is not installed in the original environment, so that when run on NumPy,
> you would only end up with NumPy installed. Are you trying to get Cython
> installed after an install of NumPy?  If you want this then it needs to be
> listed in the setup as a dependency


Thanks Kevin.  I'm cleaning up the shell scripts that we use on
Travis-CI.  There were several redundant uses of `pip install`, some
of which were installing build requirements.  The pull request is
https://github.com/numpy/numpy/pull/15275, but I think the code can be
further simplified.

Does building with setup.py (instead of pip) use pyproject.toml?

Warren



>
> On Wed, Jan 8, 2020 at 4:38 PM Warren Weckesser
> 
> wrote:
>
>> On 1/8/20, Warren Weckesser  wrote:
>> > I'm doing some work on the travis-ci scripts, and I'd like to remove
>> > some redundant calls of 'pip install'.  The scripts should get the
>> > build dependencies from a configuration file instead of having
>> > hard-coded pip commands.  Is pyproject.toml the appropriate file to
>> > use for this?  (Note: we also have test_requirements.txt, but as the
>> > name says, those are the dependencies for running the tests, not for
>> > building numpy.)
>> >
>>
>> Updating my question:  `pyproject.toml` lists numpy's build
>> dependencies in the `build_system` section of the file:
>>
>> [build-system]
>> # Minimum requirements for the build system to execute.
>> requires = [
>> "setuptools",
>> "wheel",
>> "Cython>=0.29.14",  # Note: keep in sync with tools/cythonize.py
>> ]
>>
>> So the file serves the equivalent purpose of a `requirements.txt`
>> file.  Is there an option to pip that would allow something like
>>
>> pip install -r pyproject.toml
>>
>> (with some other flag or option as needed) to install the build
>> requirements found in pyproject.toml?  In
>> https://github.com/numpy/numpy/pull/15275, I wrote a few lines of
>> Python to get the dependencies from pyproject.toml, but it seems like
>> that shouldn't be necessary.
>>
>> Warren
>>
>>
>> > Warren
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Error in Covariance and Variance calculation

2020-03-20 Thread Warren Weckesser
On 3/20/20, Gunter Meissner  wrote:
> Dear Programmers,
>
>
>
> This is Gunter Meissner. I am currently writing a book on Forecasting and
> derived the regression coefficient with Numpy:
>
>
>
> import numpy as np
> X=[1,2,3,4]
> Y=[1,8000,5000,1000]
> print(np.cov(X,Y))
> print(np.var(X))
> Beta1 = np.cov(X,Y)/np.var(X)
> print(Beta1)
>
>
>
> However, Numpy is using the SAMPLE covariance , (which divides by n-1) and
> the POPULATION variance
>
> VarX =   (which divides by n). Therefore the regression coefficient BETA1 is
> not correct.
>
> The solution is easy: Please use the population approach (dividing by n) for
> BOTH covariance and variance or use the sample approach (dividing by n-1)
>
> for BOTH covariance and variance. You may also allow the user to use both as
> in EXCEL, where the user can choose between Var.S and Var.P
>
> and Cov.P and Var.P.
>
>
>
> Thanks!!!
>
> Gunter
>


Gunter,

This is an unfortunate discrepancy in the API:  `var` uses the default
`ddof=0`, while `cov` uses, in effect, `ddof=1` by default.

You can get the consistent behavior you want by using `ddof=1` in both
functions.  E.g.

Beta1 = np.cov(X,Y, ddof=1) / np.var(X, ddof=1)

Using `ddof=1` in `np.cov` is redundant, but in this context, it is
probably useful to make explicit to the reader of the code that both
functions are using the same convention.

Changing the default in either function breaks backwards
compatibility.   That would require a long and potentially painful
deprecation process.

Warren



>
>
>
>
> Gunter Meissner, PhD
>
> University of Hawaii
>
> Adjunct Professor of MathFinance at Columbia University and NYU
>
> President of Derivatives Software www.dersoft.com 
>
>
> CEO Cassandra Capital Management  
> www.cassandracm.com
>
> CV:   www.dersoft.com/cv.pdf
>
> Email:   meiss...@hawaii.edu
>
> Tel: USA (808) 779 3660
>
>
>
>
>
>
>
>
>
> From: NumPy-Discussion
>  On Behalf Of Ralf
> Gommers
> Sent: Wednesday, March 18, 2020 5:16 AM
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new
> Datatype System
>
>
>
>
>
>
>
> On Tue, Mar 17, 2020 at 9:03 PM Sebastian Berg   > wrote:
>
> Hi all,
>
> in the spirit of trying to keep this moving, can I assume that the main
> reason for little discussion is that the actual changes proposed are
> not very far reaching as of now?  Or is the reason that this is a
> fairly complex topic that you need more time to think about it?
>
>
>
> Probably (a) it's a long NEP on a complex topic, (b) the past week has been
> a very weird week for everyone (in the extra-news-reading-time I could
> easily have re-reviewed the NEP), and (c) the amount of feedback one expects
> to get on a NEP is roughly inversely proportional to the scope and
> complexity of the NEP contents.
>
>
>
> Today I re-read the parts I commented on before. This version is a big
> improvement over the previous ones. Thanks in particular for adding clear
> examples and the diagram, it helps a lot.
>
>
>
> If it is the latter, is there some way I can help with it?  I tried to
> minimize how much is part of this initial NEP.
>
> If there is not much need for discussion, I would like to officially
> accept the NEP very soon, sending out an official one week notice in
> the next days.
>
>
>
> I agree. I think I would like to keep the option open though to come back to
> the NEP later to improve the clarity of the text about
> motivation/plan/examples/scope, given that this will be the reference for a
> major amount of work for a long time to come.
>
>
>
> To summarize one more time, the main point is that:
>
>
>
> This point seems fine, and I'm +1 for going ahead with the described parts
> of the technical design.
>
>
>
> Cheers,
>
> Ralf
>
>
>
>
> type(np.dtype(np.float64))
>
> will be `np.dtype[float64]`, a subclass of dtype, so that:
>
> issubclass(np.dtype[float64], np.dtype)
>
> is true. This means that we will have one class for every current type
> number: `dtype.num`. The implementation of these subclasses will be a
> C-written (extension) MetaClass, all details of this class are supposed
> to remain experimental in flux at this time.
>
> Cheers
>
> Sebastian
>
>
> On Wed, 2020-03-11 at 17:02 -0700, Sebastian Berg wrote:
>> Hi all,
>>
>> I am pleased to propose NEP 41: First step towards a new Datatype
>> System https://numpy.org/neps/nep-0041-improved-dtype-support.html
>>
>> This NEP motivates the larger restructure of the datatype machinery
>> in
>> NumPy and defines a few fundamental design aspects. The long term
>> user
>> impact will be allowing easier and more rich featured user defined
>> datatypes.
>>
>> As this is a large restructure, the NEP represents only the first
>> steps
>> with some additional information in further NEPs being drafted [1]
>> (this may be helpful to look 

[Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API?

2020-04-04 Thread Warren Weckesser
It would be handy if in scipy we can use the function
`numpy.lib.shape_base.normalize_axis_index` as a consistent method for
validating an `axis` argument.  Is this function considered part of
the public API?

There are modules in numpy that do not have leading underscores but
are still usually considered private.  I'm not sure if
`numpy.lib.shape_base` is one of those.  `normalize_axis_index` is not
in the top-level `numpy` namespace, and it is not included in the API
reference 
(https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default),
so I'm not sure if we can safely consider this function to be public.

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API?

2020-04-04 Thread Warren Weckesser
On 4/4/20, Warren Weckesser  wrote:
> It would be handy if in scipy we can use the function
> `numpy.lib.shape_base.normalize_axis_index` as a consistent method for
> validating an `axis` argument.  Is this function considered part of
> the public API?
>
> There are modules in numpy that do not have leading underscores but
> are still usually considered private.  I'm not sure if
> `numpy.lib.shape_base` is one of those.  `normalize_axis_index` is not
> in the top-level `numpy` namespace, and it is not included in the API
> reference
> (https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default),
> so I'm not sure if we can safely consider this function to be public.
>
> Warren
>


Answering my own question:

"shape_base.py" is not where `normalize_axis_index` is originally
defined, so that module can be ignored.

The function is actually defined in `numpy.core.multiarray`.  The pull
request in which the function was created is
https://github.com/numpy/numpy/pull/8584. Whether or not the function
was to be public is discussed starting here:
https://github.com/numpy/numpy/pull/8584#issuecomment-281179399.  A
leading underscore was discussed and intentionally not added to the
function.  On the other hand, it was not added to the top-level
namespace, and Eric Wieser wrote "Right now, it is only accessible via
np.core.multiarray.normalize_axis_index, so yes, an internal
function".

There is another potentially useful function, `normalize_axis_tuple`,
defined in `numpy.core.numeric`.  This function is also not in the
top-level numpy namespace.

So it looks like neither of these functions is currently intended to
be public. For the moment, I think we'll create our own utility
functions in scipy.  We can switch to using the numpy functions if
those functions are ever intentionally made public.

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API?

2020-04-06 Thread Warren Weckesser
On 4/5/20, Sebastian Berg  wrote:
> On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote:
>> On 4/4/20, Warren Weckesser  wrote:
>> > It would be handy if in scipy we can use the function
>> > `numpy.lib.shape_base.normalize_axis_index` as a consistent method
>> > for
>> > validating an `axis` argument.  Is this function considered part of
>> > the public API?
>> >
>> > There are modules in numpy that do not have leading underscores but
>> > are still usually considered private.  I'm not sure if
>> > `numpy.lib.shape_base` is one of those.  `normalize_axis_index` is
>> > not
>> > in the top-level `numpy` namespace, and it is not included in the
>> > API
>> > reference
>> > (
>> > https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default
>> > ),
>> > so I'm not sure if we can safely consider this function to be
>> > public.
>> >
>
> I do not see a reason why we should not make those functions public.
> The only thing I see is that they are maybe not really required in the
> main namespace, i.e. you can be expected to use::
>
> from numpy.something import normalize_axis_tuple
>
> I think, since this is a function for library authors more than end-
> users. And we do not have much prior art around where to put something
> like that.
>
> Cheers,
>
> Sebastian


Thanks, Sebastian.  For now, I proposed a private Python
implementation in scipy: https://github.com/scipy/scipy/pull/11797
If the numpy version is added to the public numpy API, it will be easy
to change scipy to use it.

Warren



>
>
>
>> > Warren
>> >
>>
>> Answering my own question:
>>
>> "shape_base.py" is not where `normalize_axis_index` is originally
>> defined, so that module can be ignored.
>>
>> The function is actually defined in `numpy.core.multiarray`.  The
>> pull
>> request in which the function was created is
>> https://github.com/numpy/numpy/pull/8584. Whether or not the function
>> was to be public is discussed starting here:
>> https://github.com/numpy/numpy/pull/8584#issuecomment-281179399.  A
>> leading underscore was discussed and intentionally not added to the
>> function.  On the other hand, it was not added to the top-level
>> namespace, and Eric Wieser wrote "Right now, it is only accessible
>> via
>> np.core.multiarray.normalize_axis_index, so yes, an internal
>> function".
>>
>> There is another potentially useful function, `normalize_axis_tuple`,
>> defined in `numpy.core.numeric`.  This function is also not in the
>> top-level numpy namespace.
>>
>> So it looks like neither of these functions is currently intended to
>> be public. For the moment, I think we'll create our own utility
>> functions in scipy.  We can switch to using the numpy functions if
>> those functions are ever intentionally made public.
>>
>> Warren
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API?

2020-04-06 Thread Warren Weckesser
On 4/6/20, Ralf Gommers  wrote:
> On Mon, Apr 6, 2020 at 3:31 PM Eric Wieser 
> wrote:
>
>> When I added this function, it was always my intent for it to be consumed
>> by downstream packages, but as Sebastian remarks, it wasn't really
>> desirable to put it in the top-level namespace.
>>
>
> This is a nice function indeed, +1 for making it public.
>
> Regarding namespace, it would be nice to decouple the `numpy` and
> `numpy.lib` namespaces, so we can put this in `numpy.lib` and say that's
> where library author functions go from now on. That'd be better than making
> all `numpy.lib.*` submodules public.
>
> Cheers,
> Ralf
>

Thanks all.  So far, it looks like folks are in favor of ensuring that
`normalize_axis_index` is public.  So I'll remove the implementation
from the scipy PR, and use the one in numpy.  For the current and
older releases of numpy, scipy can import the function
`numpy.core.multiarray`.  If a newer version of numpy is found, scipy
can grab it from wherever it is decided its public home should be.

Can we also make `normalize_axis_tuple` public?  Currently it resides
in `numpy.core.numeric`.

Warren

>
>
>>
>> I think I would be reasonably happy to make the guarantee that it would
>> not be removed (or more likely, moved) without a lengthy deprecation
>> cycle.
>>
>> Perhaps worth opening a github issue, so we can keep track of how many
>> downstream projects are already using it.
>>
>> Eric
>>
>> On Sun, 5 Apr 2020 at 15:06, Sebastian Berg 
>> wrote:
>>
>>> On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote:
>>> > On 4/4/20, Warren Weckesser  wrote:
>>> > > It would be handy if in scipy we can use the function
>>> > > `numpy.lib.shape_base.normalize_axis_index` as a consistent method
>>> > > for
>>> > > validating an `axis` argument.  Is this function considered part of
>>> > > the public API?
>>> > >
>>> > > There are modules in numpy that do not have leading underscores but
>>> > > are still usually considered private.  I'm not sure if
>>> > > `numpy.lib.shape_base` is one of those.  `normalize_axis_index` is
>>> > > not
>>> > > in the top-level `numpy` namespace, and it is not included in the
>>> > > API
>>> > > reference
>>> > > (
>>> > >
>>> https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default
>>> > > ),
>>> > > so I'm not sure if we can safely consider this function to be
>>> > > public.
>>> > >
>>>
>>> I do not see a reason why we should not make those functions public.
>>> The only thing I see is that they are maybe not really required in the
>>> main namespace, i.e. you can be expected to use::
>>>
>>> from numpy.something import normalize_axis_tuple
>>>
>>> I think, since this is a function for library authors more than end-
>>> users. And we do not have much prior art around where to put something
>>> like that.
>>>
>>> Cheers,
>>>
>>> Sebastian
>>>
>>>
>>>
>>> > > Warren
>>> > >
>>> >
>>> > Answering my own question:
>>> >
>>> > "shape_base.py" is not where `normalize_axis_index` is originally
>>> > defined, so that module can be ignored.
>>> >
>>> > The function is actually defined in `numpy.core.multiarray`.  The
>>> > pull
>>> > request in which the function was created is
>>> > https://github.com/numpy/numpy/pull/8584. Whether or not the function
>>> > was to be public is discussed starting here:
>>> > https://github.com/numpy/numpy/pull/8584#issuecomment-281179399.  A
>>> > leading underscore was discussed and intentionally not added to the
>>> > function.  On the other hand, it was not added to the top-level
>>> > namespace, and Eric Wieser wrote "Right now, it is only accessible
>>> > via
>>> > np.core.multiarray.normalize_axis_index, so yes, an internal
>>> > function".
>>> >
>>> > There is another potentially useful function, `normalize_axis_tuple`,
>>> > defined in `numpy.core.numeric`.  This function is also not in the
>>> > top-level numpy namespace.
>>> >
>>> > So it looks like neither of these functions is currently intended to
>>> > be public. For the moment, I think we'll create our own utility
>>> > functions in scipy.  We can switch to using the numpy functions if
>>> > those functions are ever intentionally made public.
>>> >
>>> > Warren
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] NumPy 1.18.3 released.

2020-04-19 Thread Warren Weckesser
On 4/19/20, Charles R Harris  wrote:
> Hi All,
>
> On behalf of the NumPy team I am pleased to announce that NumPy 1.18.3 has
> been released. This release contains various bug/regression fixes for the
> 1.18 series


Thanks Chuck!

Warren

>
> The Python versions supported in this release are 3.5-3.8. Downstream
> developers should use Cython >= 0.29.15 for Python 3.8 support and OpenBLAS
>>= 3.7 to avoid errors on the Skylake architecture.  Wheels for this
> release can be downloaded from PyPI
> ,
> source archives and release notes are available from Github
> .
>
> *Highlights*
>
> Fix for the method='eigh' and method='cholesky' options in
> numpy.random.multivariate_normal. Those were producing samples from the
> wrong distribution.
>
> *Contributors*
>
> A total of 6 people contributed to this release.  People with a "+" by
> their
> names contributed a patch for the first time.
>
>- Charles Harris
>- Max Balandat +
>- @Mibu287 +
>- Pan Jan +
>- Sebastian Berg
>- @panpiort8 +
>
>
>
> *Pull requests merged*
> A total of 5 pull requests were merged for this release.
>
>- #15916: BUG: Fix eigh and cholesky methods of
>numpy.random.multivariate_normal
>- #15929: BUG,MAINT: Remove incorrect special case in string to
> number...
>- #15930: BUG: Guarantee array is in valid state after memory error
>occurs...
>- #15954: BUG: Check that pvals is 1D in _generator.multinomial.
>- #16017: BUG: Alpha parameter must be 1D in _generator.dirichlet
>
>
> Cheers,
>
> Charles Harris
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Update the Code of Conduct Committee Membership (new members wanted)

2020-05-02 Thread Warren Weckesser
On 5/2/20, Ralf Gommers  wrote:
> On Thu, Apr 23, 2020 at 11:38 PM Sebastian Berg
> 
> wrote:
>
>> Hi all,
>>
>> it has come up in the last community call that many of our committee
>> membership lists have not been updated in a while.
>> This is not a big issue as such.  But, while these committees are not
>> very active on a day-to-day basis, they are an important part of the
>> community and it is better to update them regularly and thus also
>> ensure they remain representative of the community.
>>
>
> Thanks Sebastian!
>
>
>> We would like to start by updating the members of the Code of Conduct
>> (CoC) committee.  The CoC committee is in charge of responding and
>> following up to any reports of CoC breaches, as stated in:
>>
>>
>> https://docs.scipy.org/doc/numpy/dev/conduct/code_of_conduct.html#incident-reporting-resolution-code-of-conduct-enforcement
>>
>> If you are interested in or happy to serve on our CoC committee please
>> let me or e.g. Ralf Gommers know, join the next community meeting
>> (April 29th, 11:00PDT/18:00UTC), or reply on the list.
>>
>> I hope we will be able to discuss and reach a consensus between those
>> interested and involved quickly (possibly already on the next community
>> call).  In either case, before any changes they will be run by the
>> mailing list to ensure community consensus.
>>
>
> Following up on this: Melissa and Anirudh both volunteered for this (thank
> you!), and in the last community call we discussed this (thumbs up from
> everyone there), and gave me the assignment to follow up on this list.
>
> Both Melissa and Anirudh have experience with CoC's, Melissa for the SciPy
> conference and Anirudh in the MXNet community. They're also two of the most
> active current contributors. So it will be great to have them on the
> committee.
>
> We also discussed that it would be good to have at least one current member
> remain, to have one steering council member who knows the project history
> well on the committee. Both Stefan and I have said that we're happy to stay
> on. So I would suggest that Stefan and I get together and figure out who of
> us that will be. And then we update the website and the CoC committee's
> private email list.
>

Sounds good.  Thanks Sebastian, Ralf, Anirudh and Melissa!

Warren


> Cheers,
> Ralf
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new numpy.org is live

2020-05-24 Thread Warren Weckesser
On 5/24/20, Inessa Pawson  wrote:
> The NumPy web team is excited to announce the launch of the newly
> redesigned numpy.org. To transform the website into a comprehensive, yet
> user-centric, resource of all things NumPy was a primary focus of this
> months-long effort. We thank Joe LaChance, Ralf Gommers, Shaloo Shalini,
> Shekhar Prasad Rajak, Ross Barnowski, and Mars Lee for their extensive
> contributions to the project.


Beautiful!  Thanks for all the hard work that went into this fantastic update.

Warren


>
> The new site features a curated collection of NumPy related educational
> resources for every user level, an overview of the entire Python scientific
> computing ecosystem, and several case studies highlighting the importance
> of the library to the many advances in scientific research as well as the
> industry in recent years. The “Install” and “Get Help” pages offer advice
> on how to find answers to installation and usage questions, while those who
> are looking to connect with others within our large and diverse community
> will find the “Community” page very helpful.
>
> The new website will be updated on a regular basis with news about the
> NumPy project development milestones, community initiatives and events.
> Visitors are encouraged to explore the website and sign up for the
> newsletter.
>
> Next, the NumPy web team will focus on updating graphics and project
> identity (a new logo is coming!), adding an installation widget and
> translations, better integrating the project documentation via the new
> Sphinx theme, and improving the interactive terminal experience. Also, we
> are looking to expand our portfolio of case studies and would appreciate
> any assistance in this matter.
>
> Best regards,
> Inessa Pawson
> NumPy web team
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] ANN: SciPy 1.5.0

2020-06-22 Thread Warren Weckesser
--
> The output length of `scipy.signal.upfirdn` has been corrected, resulting
> outputs may now be shorter for some combinations of up/down ratios and
> input
> signal and filter lengths.
>
> `scipy.signal.resample` now supports a ``domain`` keyword argument for
> specification of time or frequency domain input.
>
> Other changes
> =
> Improved support for leveraging 64-bit integer size from linear algebra
> backends
> in several parts of the SciPy codebase.
>
> Shims designed to ensure the compatibility of SciPy with Python 2.7 have
> now
> been removed.
>
> Many warnings due to unused imports and unused assignments have been
> addressed.
>
> Many usage examples were added to function docstrings, and many input
> validations and intuitive exception messages have been added throughout the
> codebase.
>
> Early stage adoption of type annotations in a few parts of the codebase
>
>
> Authors
> ===
>
> * @endolith
> * Hameer Abbasi
> * ADmitri +
> * Wesley Alves +
> * Berkay Antmen +
> * Sylwester Arabas +
> * Arne Küderle +
> * Christoph Baumgarten
> * Peter Bell
> * Felix Berkenkamp
> * Jordão Bragantini +
> * Clemens Brunner +
> * Evgeni Burovski
> * Matthias Bussonnier +
> * CJ Carey
> * Derrick Chambers +
> * Leander Claes +
> * Christian Clauss
> * Luigi F. Cruz +
> * dankleeman
> * Andras Deak
> * Milad Sadeghi DM +
> * jeremie du boisberranger +
> * Stefan Endres
> * Malte Esders +
> * Leo Fang +
> * felixhekhorn +
> * Isuru Fernando
> * Andrew Fowlie
> * Lakshay Garg +
> * Gaurav Gijare +
> * Ralf Gommers
> * Emmanuelle Gouillart +
> * Kevin Green +
> * Martin Grignard +
> * Maja Gwozdz
> * Sturla Molden
> * gyu-don +
> * Matt Haberland
> * hakeemo +
> * Charles Harris
> * Alex Henrie
> * Santi Hernandez +
> * William Hickman +
> * Till Hoffmann +
> * Joseph T. Iosue +
> * Anany Shrey Jain
> * Jakob Jakobson
> * Charles Jekel +
> * Julien Jerphanion +
> * Jiacheng-Liu +
> * Christoph Kecht +
> * Paul Kienzle +
> * Reidar Kind +
> * Dmitry E. Kislov +
> * Konrad +
> * Konrad0
> * Takuya KOUMURA +
> * Krzysztof Pióro
> * Peter Mahler Larsen
> * Eric Larson
> * Antony Lee
> * Gregory Lee +
> * Gregory R. Lee
> * Chelsea Liu
> * Cong Ma +
> * Kevin Mader +
> * Maja Gwóźdź +
> * Alex Marvin +
> * Matthias Kümmerer
> * Nikolay Mayorov
> * Mazay0 +
> * G. D. McBain
> * Nicholas McKibben +
> * Sabrina J. Mielke +
> * Sebastian J. Mielke +
> * Miloš Komarčević +
> * Shubham Mishra +
> * Santiago M. Mola +
> * Grzegorz Mrukwa +
> * Peyton Murray
> * Andrew Nelson
> * Nico Schlömer
> * nwjenkins +
> * odidev +
> * Sambit Panda
> * Vikas Pandey +
> * Rick Paris +
> * Harshal Prakash Patankar +
> * Balint Pato +
> * Matti Picus
> * Ilhan Polat
> * poom +
> * Siddhesh Poyarekar
> * Vladyslav Rachek +
> * Bharat Raghunathan
> * Manu Rajput +
> * Tyler Reddy
> * Andrew Reed +
> * Lucas Roberts
> * Ariel Rokem
> * Heshy Roskes
> * Matt Ruffalo
> * Atsushi Sakai +
> * Benjamin Santos +
> * Christoph Schock +
> * Lisa Schwetlick +
> * Chris Simpson +
> * Leo Singer
> * Kai Striega
> * Søren Fuglede Jørgensen
> * Kale-ab Tessera +
> * Seth Troisi +
> * Robert Uhl +
> * Paul van Mulbregt
> * Vasiliy +
> * Isaac Virshup +
> * Pauli Virtanen
> * Shakthi Visagan +
> * Jan Vleeshouwers +
> * Sam Wallan +
> * Lijun Wang +
> * Warren Weckesser
> * Richard Weiss +
> * wenhui-prudencemed +
> * Eric Wieser
> * Josh Wilson
> * James Wright +
> * Ruslan Yevdokymov +
> * Ziyao Zhang +
>
> A total of 129 people contributed to this release.
> People with a "+" by their names contributed a patch for the first time.
> This list of names is automatically generated, and may not be fully
> complete.
>
> Issues closed for 1.5.0
> ---
>
> * `#1455 <https://github.com/scipy/scipy/issues/1455>`__: ellipord does
> returns bogus values if gstop or gpass are negative...
> * `#1968 <https://github.com/scipy/scipy/issues/1968>`__: correlate2d's
> output does not agree with correlate's output in...
> * `#2744 <https://github.com/scipy/scipy/issues/2744>`__: BUG: optimize:
> '\*\*kw' argument of 'newton_krylov' is not documented
> * `#4755 <https://github.com/scipy/scipy/issues/4755>`__: TypeError: data
> type " * `#4921 <https://github.com/scipy/scipy/issues/4921>`__: scipy.optimize
> maxiter option not working as expected
> * `#5144 <https://github.com/scipy/scipy/issues/5144>`__: RuntimeWarning on
&

[Numpy-discussion] NumPy Development Meeting Today - Triage Focus

2020-07-29 Thread Warren Weckesser
Hi all,

Sorry for the short notice--Sebastian is off this week, and the rest
of us forgot to send the email reminder.

Our bi-weekly triage-focused NumPy development meeting is in 10
minutes (today, Wednesday, July 29th, at 11 am Pacific Time (18:00
UTC)).  Everyone is invited to join in and edit the work-in-progress
meeting topics and notes:

https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel
should be prioritized or simply discussed briefly. Just comment on it
so we can label it, or add your PR/issue to this weeks topics for
discussion.

Best regards

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] New random.Generator method: permuted

2020-08-03 Thread Warren Weckesser
In one of the previous weekly zoom meetings, it was suggested
to ping the mailing list about an updated PR that implements
the `permuted` method for the Generator class in numpy.random.
The relevant issue is

https://github.com/numpy/numpy/issues/5173

and the PR is

https://github.com/numpy/numpy/pull/15121

The new method (as it would be called from Python) is

permuted(x, axis=None, out=None)

The CircleCI rendering of the docstring from the pull request is


https://14745-908607-gh.circle-artifacts.com/0/doc/build/html/reference/random/generated/numpy.random.Generator.permuted.html

The new method is an alternative to the existing `shuffle` and
`permutation` methods.  It handles the `axis` parameter similar
to how the sort methods do, i.e. when `axis` is given, the slices
along the axis are shuffled independently.  This new documentation
(added as part of the pull request) explains the API of the various
related methods:


https://14745-908607-gh.circle-artifacts.com/0/doc/build/html/reference/random/generator.html#permutations

Additional feedback on the implementation of `permuted` in the
pull request is welcome.  Further discussion of the API should
be held in the issue gh-5173 (but please familiarize yourself
with the discussion of the API in gh-5173--there has already
been quite a long discussion of several different APIs).

Thanks,

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new function: broadcast_shapes

2020-10-15 Thread Warren Weckesser
On 10/15/20, Madhulika Jain Chambers  wrote:
> Hello all,
>
> I opened a PR to add a function which returns the broadcasted shape from a
> given set of shapes:
> https://github.com/numpy/numpy/pull/17535
>
> As this is a proposed change to the API, I wanted to see if there was any
> feedback from the list.


Thanks, this is useful!  I've implemented something similar many times
over the years, and could have used it in some SciPy code, where we
currently have a private implementation in one of the `stats` modules.

Warren


>
> Thanks so much,
>
> Madhulika
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] ANN: SciPy 1.5.3

2020-10-19 Thread Warren Weckesser
On 10/17/20, Tyler Reddy  wrote:
> Hi all,
>
> On behalf of the SciPy development team I'm pleased to announce
> the release of SciPy 1.5.3, which is a bug fix release that includes
> Linux ARM64 wheels for the first time.


Thanks Tyler!  A lot of work goes into a SciPy release, so I'm
grateful you continue to manage the releases so well.

Warren


>
> Sources and binary wheels can be found at:
> https://pypi.org/project/scipy/
> and at: https://github.com/scipy/scipy/releases/tag/v1.5.3
>
> One of a few ways to install this release with pip:
>
> pip install scipy==1.5.3
>
> ==
> SciPy 1.5.3 Release Notes
> ==
>
> SciPy 1.5.3 is a bug-fix release with no new features
> compared to 1.5.2. In particular, Linux ARM64 wheels are now
> available and a compatibility issue with XCode 12 has
> been fixed.
>
> Authors
> ==
>
> * Peter Bell
> * CJ Carey
> * Thomas Duvernay +
> * Gregory Lee
> * Eric Moore
> * odidev
> * Dima Pasechnik
> * Tyler Reddy
> * Simon Segerblom Rex +
> * Daniel B. Smith
> * Will Tirone +
> * Warren Weckesser
>
> A total of 12 people contributed to this release.
> People with a "+" by their names contributed a patch for the first time.
> This list of names is automatically generated, and may not be fully
> complete.
>
> Issues closed for 1.5.3
> --
>
> * `#9611 <https://github.com/scipy/scipy/issues/9611>`__: Overflow error
> with new way of p-value calculation in kendall...
> * `#10069 <https://github.com/scipy/scipy/issues/10069>`__:
> scipy.ndimage.watershed_ift regression in 1.0.0
> * `#11260 <https://github.com/scipy/scipy/issues/11260>`__: BUG: DOP853
> with complex data computes complex error norm, causing...
> * `#11479 <https://github.com/scipy/scipy/issues/11479>`__: RuntimeError:
> dictionary changed size during iteration on loading...
> * `#11972 <https://github.com/scipy/scipy/issues/11972>`__: BUG (solved):
> Error estimation in DOP853 ODE solver fails for...
> * `#12543 <https://github.com/scipy/scipy/issues/12543>`__: BUG: Picture
> rotated 180 degrees and rotated -180 degrees should...
> * `#12613 <https://github.com/scipy/scipy/issues/12613>`__: Travis X.4 and
> X.7 failures in master
> * `#12654 <https://github.com/scipy/scipy/issues/12654>`__:
> scipy.stats.combine_pvalues produces wrong results with
> method='mudholkar_george'
> * `#12819 <https://github.com/scipy/scipy/issues/12819>`__: BUG: Scipy
> Sparse slice indexing assignment Bug with zeros
> * `#12834 <https://github.com/scipy/scipy/issues/12834>`__: BUG: ValueError
> upon calling Scipy Interpolator objects
> * `#12836 <https://github.com/scipy/scipy/issues/12836>`__: ndimage.median
> can return incorrect values for integer inputs
> * `#12860 <https://github.com/scipy/scipy/issues/12860>`__: Build failure
> with Xcode 12
>
> Pull requests for 1.5.3
> -
>
> * `#12611 <https://github.com/scipy/scipy/pull/12611>`__: MAINT: prepare
> for SciPy 1.5.3
> * `#12614 <https://github.com/scipy/scipy/pull/12614>`__: MAINT: prevent
> reverse broadcasting
> * `#12617 <https://github.com/scipy/scipy/pull/12617>`__: MAINT: optimize:
> Handle nonscalar size 1 arrays in fmin_slsqp...
> * `#12623 <https://github.com/scipy/scipy/pull/12623>`__: MAINT: stats:
> Loosen some test tolerances.
> * `#12638 <https://github.com/scipy/scipy/pull/12638>`__: CI, MAINT: pin
> pytest for Azure win
> * `#12668 <https://github.com/scipy/scipy/pull/12668>`__: BUG: Ensure
> factorial is not too large in mstats.kendalltau
> * `#12705 <https://github.com/scipy/scipy/pull/12705>`__: MAINT:
> \`openblas_support\` added sha256 hash
> * `#12706 <https://github.com/scipy/scipy/pull/12706>`__: BUG: fix
> incorrect 1d case of the fourier_ellipsoid filter
> * `#12721 <https://github.com/scipy/scipy/pull/12721>`__: BUG: use
> special.sindg in ndimage.rotate
> * `#12724 <https://github.com/scipy/scipy/pull/12724>`__: BUG: per #12654
> adjusted mudholkar_george method to combine p...
> * `#12726 <https://github.com/scipy/scipy/pull/12726>`__: BUG: Fix DOP853
> error norm for complex problems
> * `#12730 <https://github.com/scipy/scipy/pull/12730>`__: CI: pin xdist for
> Azure windows
> * `#12786 <https://github.com/scipy/scipy/pull/12786>`__: BUG: stats: Fix
> formula in the \`stats\` method of the ARGUS...
> * `#12795 <https://github.com/scipy/scipy/pull/12795>`__: CI: Pin
> setuptools on windows CI
> * `#12830 <https://github.com/scipy/scipy/pull/12830&g

Re: [Numpy-discussion] [SciPy-Dev] NumPy 1.19.3 release

2020-10-28 Thread Warren Weckesser
On 10/28/20, Charles R Harris  wrote:
> Hi All,
>
> On behalf of the NumPy team I am pleased to announce that NumPy 1.19.3 has
> been released. NumPy 1.19.3 is a small maintenance release with two major
> improvements:
>
>- Python 3.9 binary wheels on all supported platforms,
>- OpenBLAS fixes for Windows 10 version 2004 fmod bug.
>
> This release supports Python 3.6-3.9 and is linked with OpenBLAS 3.12 to
> avoid some of the fmod problems on Windows 10 version 2004. Microsoft is
> aware of the problem and users should upgrade when the fix becomes
> available, the fix here is limited in scope.
>
> NumPy Wheels for this release can be downloaded from the PyPI
> , source archives, release notes,
> and wheel hashes are available on Github
> . Linux users will
> need pip >= 0.19.3 in order to install manylinux2010 and manylinux2014
> wheels.
>
> *Contributors*
>
> A total of 8 people contributed to this release.  People with a "+" by
> their
> names contributed a patch for the first time.
>
>
>- Charles Harris
>- Chris Brown +
>- Daniel Vanzo +
>- E. Madison Bray +
>- Hugo van Kemenade +
>- Ralf Gommers
>- Sebastian Berg
>- @danbeibei +
>
>
>
> *Pull requests merged*
> A total of 10 pull requests were merged for this release.
>
>- #17298: BLD: set upper versions for build dependencies
>- #17336: BUG: Set deprecated fields to null in PyArray_InitArrFuncs
>- #17446: ENH: Warn on unsupported Python 3.10+
>- #17450: MAINT: Update test_requirements.txt.
>- #17522: ENH: Support for the NVIDIA HPC SDK nvfortran compiler
>- #17568: BUG: Cygwin Workaround for #14787 on affected platforms
>- #17647: BUG: Fix memory leak of buffer-info cache due to relaxed
>strides
>- #17652: MAINT: Backport openblas_support from master.
>- #17653: TST: Add Python 3.9 to the CI testing on Windows, Mac.
>- #17660: TST: Simplify source path names in test_extending.
>
> Cheers,
>
> Charles Harris
>


Thanks for managing the release, Chuck!

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler

2021-03-06 Thread Warren Weckesser
On 3/6/21, zoj613  wrote:
> Hi All,
>
> I noticed that the transformed rejection method for generating Poisson
> random variables used in numpy makes use of the `random_loggam` function
> which directly calculates the log-gamma function. It appears that a
> log-factorial lookup table was added a few years back which could be used
> in
> place of random_loggam since the input is always an integer. Is there a
> reason for not using this table instead? See link below for the line of
> code:
>
> https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572
>
> Regards
> Zolisa
>

Hi Zolisa,

In the pull request where the C function logfactorial was added
(https://github.com/numpy/numpy/pull/13761), I originally modified the
Poisson code to use logfactorial as you suggest, but Kevin (@bashtage
on github) pointed out that the change could potentially alter the
random stream for the legacy version. Making the change requires
creating separate C functions, one for the legacy code that remains
unchanged, and one for the newer Generator class that would use
logfactorial.  You can see the comments here (click on "Show
resolved"):

https://github.com/numpy/numpy/pull/13761#pullrequestreview-249973405

At the time, making that change was not a high priority, so I didn't
pursue it. It does make sense to use the logfactorial function there,
and I'd be happy to see it updated, but be aware that making the
change is more work than changing just the function call.

Warren

>
>
> --
> Sent from: http://numpy-discussion.10968.n7.nabble.com/
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Ask GitHub to provide an option to not render .rst files

2021-11-22 Thread Warren Weckesser
Hey all,

If you've ever tried to inspect a file on github with the `.rst`
extension, there's a good chance that you were frustrated by GitHub
providing a rendered view *only* of the file, with no option to view
the source code like any other text file.   It is certainly nice to
have a rendered view, but often I want to inspect the actual source
code (e.g. to find out at which line a heading occurs, perhaps to
include a link to it in a pull request).  There is the "raw" option,
or you could click "edit", but what is really desired is a view of the
source like any other source code.

Files with the `.md` extension are also rendered by default, but there
are buttons that allow you to either "Display the source blob" or
"Display the rendered blob". There is no such option for `.rst` files.
If they can do it for `.md` files, it seems like it should be easy to
do the same for `.rst` files.

I've tried creating a ticket on github about this, but it seems like
tickets go to the wrong group.  The response I got was from the
"GitHub Support" team, and they said they forwarded the request to the
"Product" team.  (It's all GitHub to me.)  It was also suggested that
I bring this up in a public feedback discussions, so I did:

https://github.com/github/feedback/discussions/7999

If you have a moment, could you add a comment, or click the upvote
button, or add some other feedback to the discussion?  It would be
nice to get this simple enhancement into the GitHub site.

Thanks,

Warren
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: An article on numpy data types

2021-12-28 Thread Warren Weckesser
On 12/28/21, Lev Maximov  wrote:
> On Tue, Dec 28, 2021 at 3:43 PM Evgeni Burovski
> 
> wrote:
>
>> Very nice overview!
>>
>> One question and one suggestion:
>>
>> 1. Is integer wraparound guaranteed for signed ints, or is it an
>> implementation detail? For unsigned ints, sure, it's straight from a C
>> standard; what about signed types however.
>>
> Signed ints wraparound in just the same way as unsigned, both in C and in
> NumPy. Added an illustration.

Overflow of *signed* ints in the C language is *undefined behavior*.
In practice, most compilers might do what you expect, but the
wrap-around behavior is not guaranteed and should not be relied on.

Warren


>
>
>> 2. It'd be nice to explicitly stress that dtype=float corresponds to a C
>> double, not a C float type. This frequently trips people trying to
>> interface with C or Cython (in my experience)
>>
> Done, thanks!
>
> Best regards,
> Lev
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Performance mystery

2022-01-18 Thread Warren Weckesser
In the script below, the evaluation of the expression `z.real**2 +
z.imag**2` is timed using the `timeit` module. `z` is a 1D array of
random samples with dtype `np.complex128` and with length 25.

The mystery is the change in performance of the calculation from the
first array to which it is applied to the second.  The output of the
script is

```
numpy version 1.23.0.dev0+460.gc30876f64

 619.7096 microseconds
 625.3833 microseconds
 634.8389 microseconds

 137.0659 microseconds
 137.5231 microseconds
 137.5582 microseconds
```

Each batch of three timings corresponds to repeating the timeit
operation three times on the same random array `z`; i.e. a new array
`z` is generated for the second batch.  The question is why is does it
take so much longer to evaluate the expression the first time?

Some other details:

* If I change the expression to, say, `'z.real + z.imag'`, the huge
disparity disappears.
* If I generate more random `z` arrays, the performance remains at the
level of approximately 140 usec.
* I used the main branch of numpy for the above output, but the same
thing happens with 1.20.3, so this is not the result of a recent
change.
* So far, when I run the script, I always see output like that shown
above: the time required for the first random array is typically four
times that required for the second array.  If I run similar commands
in ipython, I have seen the slow case repeated several times (with
newly generated random arrays), but eventually the time drops down to
140 usec (or so), and I don't see the slow case anymore.
* I'm using a 64 bit Linux computer:
  ```
  $ uname -a
  Linux pop-os 5.15.8-76051508-generic
#202112141040~1639505278~21.10~0ede46a SMP Tue Dec 14 22:38:29 U
x86_64 x86_64 x86_64 GNU/Linux
  ```

Any ideas?

Warren

Here's the script:

```
import timeit
import numpy as np


def generate_sample(n, rng):
return rng.normal(scale=1000, size=2*n).view(np.complex128)


print(f'numpy version {np.__version__}')
print()

rng = np.random.default_rng()
n = 25
timeit_reps = 1

expr = 'z.real**2 + z.imag**2'

z = generate_sample(n, rng)
for _ in range(3):
t = timeit.timeit(expr, globals=globals(), number=timeit_reps)
print(f"{1e6*t/timeit_reps:9.4f} microseconds")
print()

z = generate_sample(n, rng)
for _ in range(3):
t = timeit.timeit(expr, globals=globals(), number=timeit_reps)
print(f"{1e6*t/timeit_reps:9.4f} microseconds")
print()
```
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Performance mystery

2022-01-23 Thread Warren Weckesser
On 1/20/22, Francesc Alted  wrote:
> On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted  wrote:
>
>> On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert 
>> wrote:
>>
>>> Given that this seems to be Linux only, is this related to how glibc
>>> does
>>> large allocations (>128kB) using mmap()?
>>>
>>> https://stackoverflow.com/a/33131385
>>>
>>
>> That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of `z` is
>> almost 4 MB, mmap machinery is probably getting involved here.  Also, as
>> pages acquired via anonymous mmap are not actually allocated until you
>> access them the first time, that would explain that the first access is
>> slow.  What puzzles me is that the timeit loops access `z` data 3*1
>> times, which is plenty of time for doing the allocation (just should
>> require just a single iteration).
>>
>
> I think I have more evidence that what is happening here has to see of how
> the malloc mechanism works in Linux.  I find the next explanation to be
> really good:
>
> https://sourceware.org/glibc/wiki/MallocInternals
>
> In addition, this excerpt of the mallopt manpage (
> https://man7.org/linux/man-pages/man3/mallopt.3.html) is very significant:
>
>   Note: Nowadays, glibc uses a dynamic mmap threshold by
>   default.  The initial value of the threshold is 128*1024,
>   but when blocks larger than the current threshold and less
>   than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the
>   threshold is adjusted upward to the size of the freed
>   block.  When dynamic mmap thresholding is in effect, the
>   threshold for trimming the heap is also dynamically
>   adjusted to be twice the dynamic mmap threshold.  Dynamic
>   adjustment of the mmap threshold is disabled if any of the
>   M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or
>   M_MMAP_MAX parameters is set.
>
> This description matches closely what is happening here: after `z` is freed
> (replaced by another random array in the second part of the calculation),
> then dynamic mmap threshold enters and the threshold is increased by 2x of
> the freed block (~4MB in this case), so for the second part, the program
> break
> (i.e. where the heap ends) is increased instead, which is faster because
> this memory does not need to be zeroed before use.
>
> Interestingly, the M_MMAP_THRESHOLD for system malloc can be set by using
> the MALLOC_MMAP_THRESHOLD_ environment variable.  For example, the original
> times are:
>
> $ python mmap-numpy.py
> numpy version 1.20.3
>
>  635.4752 microseconds
>  635.8906 microseconds
>  636.0661 microseconds
>
>  144.7238 microseconds
>  143.9147 microseconds
>  144.0621 microseconds
>
> but if we enforce to always use mmap:
>
> $ MALLOC_MMAP_THRESHOLD_=0 python mmap-numpy.py
> numpy version 1.20.3
>
>  628.8890 microseconds
>  628.0965 microseconds
>  628.7590 microseconds
>
>  640.9369 microseconds
>  641.5104 microseconds
>  642.4027 microseconds
>
> so first and second parts executes at the same (slow) speed.  And, if we
> set the threshold to be exactly 4 MB:
>
> $ MALLOC_MMAP_THRESHOLD_=4194304 python mmap-numpy.py
> numpy version 1.20.3
>
>  630.7381 microseconds
>  631.3634 microseconds
>  632.2200 microseconds
>
>  382.6925 microseconds
>  380.1790 microseconds
>  380.0340 microseconds
>
> we see how performance is increased for the second part (although that not
> as much as without specifying the threshold manually; probably this manual
> setting prevents other optimizations to quick in).
>
> As a final check, if we use other malloc systems, like the excellent
> mimalloc (https://github.com/microsoft/mimalloc), we can get really good
> performance for the two parts:
>
> $ LD_PRELOAD=/usr/local/lib/libmimalloc.so  python mmap-numpy.py
> numpy version 1.20.3
>
>  147.5968 microseconds
>  146.9028 microseconds
>  147.1794 microseconds
>
>  148.0905 microseconds
>  147.7667 microseconds
>  147.5180 microseconds
>
> However, as this is avoiding the mmap() calls, this approach probably uses
> more memory, specially when large arrays need to be handled.
>
> All in all, this is testimonial of how much memory handling can affect
> performance in modern computers.  Perhaps it is time for testing different
> memory allocation strategies in NumPy and come up with suggestions for
> users.
>
> Francesc
>
>
>
>>
>>
>>>
>>>
>>> On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg <
>>> sebast

[Numpy-discussion] Re: Feature request: function to get minimum and maximum values simultaneously (as a tuple)

2022-06-30 Thread Warren Weckesser
On 6/30/22, Ewout ter Hoeven  wrote:
> A function to get the minimum and maximum values of an array simultaneously
> could be very useful, from both a convenience and performance point of view.
> Especially when arrays get larger the performance benefit could be
> significant, and even more if the array doesn't fit in L2/L3 cache or even
> memory.
>
> There are many cases where not either the minimum or the maximum of an array
> is required, but both. Think of clipping an array, getting it's range,
> checking for outliers, normalizing, making a plot like a histogram, etc.
>
> This function could be called aminmax() for example, and also be called like
> ndarray.minmax(). It should return a tuple (min, max) with the minimum and
> maximum values of the array, identical to calling (ndarray.min(),
> ndarray.max()).
>
> With such a function, numpy.ptp() and the special cases of numpy.quantile(a,
> q=[0,1]) and numpy.percentile(a, q=[0,100]) could also potentially be
> speeded up, among others.
>
> Potentially argmin and argmax could get the same treatment, being called
> argminmax().
>
> There is also a very extensive post on Stack Overflow (a bit old already)
> with discussion and benchmarks:
> https://stackoverflow.com/questions/12200580/numpy-function-for-simultaneous-max-and-min


FYI, I have a fairly simple gufunc implementation of `minmax` in
ufunclab (https://github.com/WarrenWeckesser/ufunclab),  along with
`arg_minmax`, `min_argmin` and `max_argmax`.  See README.md starting
here: https://github.com/WarrenWeckesser/ufunclab#minmax

For those familiar with C and gufunc implementation details, you can
find the implementations in
https://github.com/WarrenWeckesser/ufunclab/blob/main/src/minmax/minmax_gufunc.c.src.
  You'll see that, as far as gufuncs go, these are not very
sophisticated.  They do not include implementations for all the NumPy
data types, and  I haven't yet spent much time on optimization.

Warren


> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Representation of NumPy scalars

2022-09-08 Thread Warren Weckesser
On 9/8/22, Andrew Nelson  wrote:
> On Thu, 8 Sept 2022, 19:42 Sebastian Berg, 
> wrote:
>
>>
>> TL;DR:  NumPy scalars representation is e.g. `34.3` instead of
>> `float32(34.3)`.  So the representation is missing the type
>> information.  What are your thoughts on changing that?


I like the idea, but as others have noted, this could result in a lot
of churn in the docs of many projects.


>
>
>> From the Python documentation on repr:
>
>
> From the Python documentation on repr:
>
> "this should look like a valid Python expression that could be used to
> recreate an object with the same value"


To quote from https://docs.python.org/3/library/functions.html#repr:

> For many types, this function makes an attempt to return a string
> that would yield an object with the same value when passed to eval();

Sebastian, is this an explicit goal of the change?  (Personally, I've
gotten used to not taking this too seriously, but my world view is
biased by the long-term use of NumPy, which has never followed this
guideline.)

If that is a goal, than the floating point types with precision
greater than double precision will need to display the argument of the
type as a string.  For example, the following is run on a platform
where numpy.longdouble is extended precision (80 bits):

```
In [161]: longpi = np.longdouble('3.14159265358979323846')

In [162]: longpi
Out[162]: 3.1415926535897932385

In [163]: np.longdouble(3.1415926535897932385)  # Argument is parsed
as 64 bit float
Out[163]: 3.141592653589793116

In [164]: np.longdouble('3.1415926535897932385')  # Correctly
reproduces the longdouble
Out[164]: 3.1415926535897932385
```

Warren

>
> I think it definitely we should definitely have:
>
> repr(np.float32(34.3)) == 'float32(34.3)'
> And
> str(np.float32(34.3)) == '34.3'
>
> It seems buglike not to have that.
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Addition of useful new functions from the array API specification

2022-12-12 Thread Warren Weckesser
On 12/12/22, Aaron Meurer  wrote:
> On Mon, Dec 12, 2022 at 8:46 AM Sebastian Berg
>  wrote:
>>
>> On Wed, 2022-12-07 at 14:21 -0700, Aaron Meurer wrote:
>> > Hi all.
>> >
>> > As discussed in today's community meeting, I plan to start working on
>> > adding some useful functions to NumPy which are part of the array API
>> > standard https://data-apis.org/array-api/latest/index.html.
>> >
>> > Although these are all things that will be needed for NumPy to be
>> > standard compliant, my focus for now at least is going to be on new
>> > functionality that is useful for NumPy independent of the standard.
>> > The things that I (and possibly others) plan on working on are:
>>
>>
>> Generally, I don't have much opinion on these, most seem fine to me.
>> The pure aliases/shortforms, I feel should maybe be discussed
>> separately.
>>
>> * `np.linalg.matrix_transpose` (basically an alias/replacement for
>>   `np.linalg.transpose).  (No strong opinion from me, the name is
>>a bit clearer.)
>>   Are you proposing to add `np.linalg.matrix_transpose` or also
>>   `np.matrix_transpose`?
>
> The spec has the function in both namespaces, so that is the proposal
> (my PR https://github.com/numpy/numpy/pull/22767 only adds it to
> linalg for now because I wasn't sure the correct way to add it to np).
>
>>
>> * `ndarray.mT`, I don't have an opinion on it.  At some point I would
>>   have preferred transitioning `ndarray.T` to be this, but...
>>
>> * Named tuples for tuple results (in linalg, such as `eigh`).
>>   I suppose this should be backwards compatible, and thus a simple
>>   improvement.
>>
>> * vecdot: I guess we have vdot, but IIRC that has other semantics
>>   so this mirrors `matmul` and avoids multi-signature functions.
>>   (It would be good if this is a proper gufunc, probably).
>>
>> * copy=... argument for reshape.  I like that.  An important step here
>>   is to also add a FutureWarning to the `copy=` in `np.array()`.
>>
>> * `matrix_norm` and `vector_norm` seem OK to me.  I guess only
>>   `matrix_norm` would be a proper gufunc unfortunately, while
>>   `vector_norm` would be almost the same as norm.
>>   In either case `matrix_norm` seems a bit tedious right now and
>>   `vector_norm` probably adds functionality since multiple axes
>>   are probably valid.
>
> Why can't vector_norm be a gufunc?
>

For what it's worth, I implemented vector norm and vector dot as
gufuncs in ufunclab:

* https://github.com/WarrenWeckesser/ufunclab#vnorm
* https://github.com/WarrenWeckesser/ufunclab#vdot

Warren


> Aaron Meurer
>
>>
>>
>> - Sebastian
>>
>>
>> PS: For the `ndarray.H` proposal, "its complicated" is maybe too fuzzy:
>> The complexity is about not being able to return a view for complex
>> numbers.  That is `.H` is:
>>
>> * maybe slightly more expensive than may be expected for an attribute
>> * different for real values, which could return a view
>> * a potential problem if we would want to return a view in the future
>>
>> So we need some answer to those worries to have a chance at pushing it
>> forward unfortunately.  (Returning something read-only could reduce
>> some of those worries?  Overall, they probably cannot be quite removed
>> though, just argued to be worthwhile?)
>>
>>
>>
>>
>> >
>> > - A new function matrix_transpose() and corresponding ndarray
>> > attribute x.mT. Unlike transpose(), matrix_transpose() will require
>> > at
>> > least 2 dimensions and only operate on the last two dimensions (it's
>> > effectively an alias for swapaxes(x, -1, -2)). This was discussed in
>> > the past at https://github.com/numpy/numpy/issues/9530 and
>> > https://github.com/numpy/numpy/issues/13797. See
>> >
>> > https://data-apis.org/array-api/latest/API_specification/generated/signatures.linear_algebra_functions.matrix_transpose.html
>> >
>> > - namedtuple outputs for eigh, qr, slogdet and svd. This would only
>> > apply to the instances where they currently return a tuple (e.g.,
>> > svd(compute_uv=False) would still just return an array). See the
>> > corresponding pages at
>> > https://data-apis.org/array-api/latest/extensions/index.html for the
>> > namedtuple names. These four functions are the ones that are part of
>> > the array API spec, but if there are other functions that aren't part
>> > of the spec which we'd like to update to namedtuples as well for
>> > consistency, I can look into that.
>> >
>> > - New functions matrix_norm() and vector_norm(), which split off the
>> > behavior of norm() between vector and matrix specific
>> > functionalities.
>> > This is a cleaner API and would allow these functions to be proper
>> > gufuncs. See
>> > https://data-apis.org/array-api/latest/extensions/generated/signatures.linalg.vector_norm.html
>> > and
>> > https://data-apis.org/array-api/latest/extensions/generated/signatures.linalg.matrix_norm.html
>> > .
>> >
>> > - New function vecdot() which does a broadcasted 1-D dot product
>> > along
>> > a specified axis
>> >
>> > https://data-apis.org/array-api/la

[Numpy-discussion] Re: Giving deprecation of e.g. `float(np.array([1]))` a shot (not 0-d)

2023-04-20 Thread Warren Weckesser
On 4/20/23, Sebastian Berg  wrote:
> Hi all,
>
> Unlike conversions of 0-d arrays via:
>
> float(np.array([1]))
>
> conversions of 1-D or higher dimensional arrays with a single element
> are a bit strange:
>
> float(np.array([1]))
>
> And deprecating it has come up often enough with many in favor, but
> also many worried about the possible annoyance to users.
> I decided to give the PR a shot, I may have misread the room on it
> though:
>
> https://github.com/numpy/numpy/pull/10615
>
> So if this turns out noisy (or you may simply disagree), I am happy to
> revert!
>
> There was always the worry that it might be painful for downstream.
> SciPy, pandas, matplotlib should all be fine (were fixed in the past
> years).  And the fact that SciPy required much more changes than the
> other gives me some hope that many libraries won't mind.
>
> For end-users, I would lean towards taking it slow, but if you see
> issues there we can also revert of course.
>
> Cheers,
>
> Sebastian
>
>

Thanks Nico, and Sebastian, and everyone else involved in the PRs.

This also affects `np.float64`:

```
In [61]: np.__version__
Out[61]: '1.25.0.dev0+1203.g1acac891f'

In [62]: np.float64(0.0)
Out[62]: 0.0

In [63]: np.float64(np.array(0.0))
Out[63]: 0.0

In [64]: np.float64(np.array([0.0]))
:1: DeprecationWarning: Conversion of
an array with ndim > 0 to a scalar is deprecated, and will error in
future. Ensure you extract a single element from your array before
performing this operation. (Deprecated NumPy 1.25.)
  np.float64(np.array([0.0]))
Out[64]: 0.0

In [65]: np.float64(np.array([0.0, 0.0]))
Out[65]: array([0., 0.])

```

In 1.24.2, `np.float64(np.array([0.0])` returns the the scalar 0.0.

If passing arrays to `np.float64()` is intentionally supported, it
seems it would be more consistent for `np.float64(np.array([0.0]))` to
return `np.array([0.0])`.  That is how the other numpy types work
(e.g. `np.complex128`, `np.int64`, etc.). But I'm not sure if there is
a deprecation/update path that would get us there.

Warren

>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Giving deprecation of e.g. `float(np.array([1]))` a shot (not 0-d)

2023-05-16 Thread Warren Weckesser
On 4/21/23, Sebastian Berg  wrote:
> On Thu, 2023-04-20 at 20:17 +0200, Sebastian Berg wrote:
>> On Thu, 2023-04-20 at 13:59 -0400, Warren Weckesser wrote:
>> > On 4/20/23, Sebastian Berg  wrote:
>> > > Hi all,
>> > >
>> > >
>>
>> 
>>
>> >
>> > In [64]: np.float64(np.array([0.0]))
>> > :1: DeprecationWarning: Conversion
>> > of
>> > an array with ndim > 0 to a scalar is deprecated, and will error in
>> > future. Ensure you extract a single element from your array before
>> > performing this operation. (Deprecated NumPy 1.25.)
>> >   np.float64(np.array([0.0]))
>> > Out[64]: 0.0
>> >
>> > In [65]: np.float64(np.array([0.0, 0.0]))
>> > Out[65]: array([0., 0.])
>
>
> Do you have any thoughts on how to make progress Warren?
>

Sorry for the late reply; the recent comment in
https://github.com/numpy/numpy/issues/23400 reminded me of this. As
noted in the link in the recent comment in that issue, handling of
nonscalar inputs of the numpy scalar types was also briefly discussed
in the mailing list three years ago:
https://mail.python.org/pipermail/numpy-discussion/2020-April/080566.html

I don't have any concrete ideas other than outright deprecating the
handling of anything that is not a scalar, but that might be too
disruptive.

Warren


> Had a bit of a look at it.  You are probably aware that this is because
> for float, str, and bytes (our subclasses of them), we have
> (approximately):
>
> def __new__(cls, *args, **kwargs):
> try:
> super().__new__(*args, **kwargs)
> except:
> if len(args) != 1 or kwargs != {}:
> raise
>
> return np.asarray(args[0])[()]  # scalar if 0-D
>
>
> For float64, I am tempted to just remove the super() path entirely and
> put in a fast-path for simple scalar object (like python `int`,
> `float`, `bool`, `str`) to avoid the full `np.asarray()` call.
>
>
> For unicode/bytes its a bit of a mess though?  I suspect for them the
> `array` path is currently just useless in practice, because even arrays
> are interpreted as scalars here.
>
> The best path might be even to just deprecate array input entirely for
> them?  Even then you have at least one case that is tricky:
>
> np.bytes_(5)
>
> returns an empty string (since we strip zeros) but if we would do the
> same as `np.asarray(5, dtype=np.bytes_)[()]` we would get a different
> result.
> (And raising on a non 0-D array doesn't help there.)
>
> Maybe the right way is to go as far and check if both paths match for
> non-trivial bytes?!
>
> - Sebastian
>
>
>> >
>>
>> Hmmmpf, that would be a good follow-up to fix.  In theory a
>> FutureWarning I guess (returning the array), but in practice, I think
>> we should just give the correct array result.
>>
>> (I don't love returning arrays from scalar constructors, but that is
>> another thing and not for now.)
>>
>> - Sebsatian
>>
>>
>> > ```
>> >
>> > In 1.24.2, `np.float64(np.array([0.0])` returns the the scalar 0.0.
>> >
>> > If passing arrays to `np.float64()` is intentionally supported, it
>> > seems it would be more consistent for `np.float64(np.array([0.0]))`
>> > to
>> > return `np.array([0.0])`.  That is how the other numpy types work
>> > (e.g. `np.complex128`, `np.int64`, etc.). But I'm not sure if there
>> > is
>> > a deprecation/update path that would get us there.
>> >
>> > Warren
>> >
>> > >
>> > > ___
>> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
>> > > To unsubscribe send an email to numpy-discussion-le...@python.org
>> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> > > Member address: warren.weckes...@gmail.com
>> > >
>> > ___
>> > NumPy-Discussion mailing list -- numpy-discussion@python.org
>> > To unsubscribe send an email to numpy-discussion-le...@python.org
>> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> > Member address: sebast...@sipsolutions.net
>> >
>>
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-18 Thread Warren Weckesser
On Fri, Aug 18, 2023 at 4:59 AM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> I was trying to get a feel for how often the work around occurs. I found
> three clear examples in Scipy and one unclear case. One case in holoviews.
> Two in numpy. One from soundappraisal's code base.
>

See also my comment from back in 2020:
https://github.com/numpy/numpy/pull/14542#issuecomment-586494608

Anyone interested in this enhancement is encouraged to review the
discussion in that pull request (https://github.com/numpy/numpy/pull/14542),
and an earlier issue from 2015: https://github.com/numpy/numpy/issues/6044

Warren



> Next to prepending to the output, I also see prepending to the input as a
> workaround.
>
> Some examples of workarounds:
>
> scipy: (prepending to the output)
>
> scipy/scipy/sparse/construct.py:
>
> '''Python
> row_offsets = np.append(0, np.cumsum(brow_lengths))
> col_offsets = np.append(0, np.cumsum(bcol_lengths))
> '''
>
> scipy/scipy/sparse/dia.py:
>
> '''Python
> indptr = np.zeros(num_cols + 1, dtype=idx_dtype)
> indptr[1:offset_len+1] = np.cumsum(mask.sum(axis=0))
> '''
>
> scipy/scipy/sparse/csgraph/_tools.pyx:
>
> '''Python
> indptr = np.zeros(N + 1, dtype=ITYPE)
> indptr[1:] = mask.sum(1).cumsum()
> '''
>
> Not sure whether this is also an example:
>
> scipy/scipy/stats/_hypotests_pythran.py
> '''Python
> # Now fill in the values. We cannot use cumsum, unfortunately.
> val = 0.0 if minj == 0 else 1.0
> for jj in range(maxj - minj):
> j = jj + minj
> val = (A[jj + minj - lastminj] * i + val * j) / (i + j)
> A[jj] = val
> '''
>
> holoviews: (prepending to the input)
>
> '''Python
> # We add a zero in the begging for the cumulative sum
> points = np.zeros((areas_in_radians.shape[0] + 1))
> points[1:] = areas_in_radians
> points = points.cumsum()
> '''
>
>
> numpy (prepending to the input):
>
> numpy/numpy/lib/_iotools.py :
>
> '''Python
> idx = np.cumsum([0] + list(delimiter))
> '''
>
> numpy/numpy/lib/histograms.py
>
> '''Python
> cw = np.concatenate((zero, sw.cumsum()))
> '''
>
>
>
> soundappraisal own code: (prepending to the output)
>
> '''Python
> def get_cumulativepixelareas(whiteboard):
> whiteboard['cumulativepixelareas'] = \
> np.concatenate((np.array([0, ]),
> np.cumsum(whiteboard['pixelareas'])))
> return True
> '''
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: warren.weckes...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Tricky ufunc implementation question

2025-06-27 Thread Warren Weckesser via NumPy-Discussion
On Fri, Jun 27, 2025 at 5:29 PM Benjamin Root via NumPy-Discussion
 wrote:
>
> I'm looking at a situation where I like to wrap a C++ function that takes two 
> doubles as inputs, and returns an error code, a position vector, and a 
> velocity vector so that I essentially would have a function signature of (N), 
> (N) -> (N), (N, 3), (N, 3). When I try to use np.vectorize() or 
> np.frompyfunc() on the python version of this function, I keep running into 
> issues where it wants to make the outputs into object arrays of tuples. And 
> looking at utilizing PyUFunc_FromFuncAndData, it isn't clear to me how I can 
> tell it to expect those two output arrays to have a size 3 outer dimension.
>
> Are ufuncs the wrong thing here? How should I go about this? Is it even 
> possible?

Ben,

It looks like the simplest signature for your core operation would be
(),()->(),(3),(3), with broadcasting taking care of higher dimensional
inputs.  Because not all the core shapes are scalars, that would
require a *generalized* ufunc (gufunc).  There is an open issue
(https://github.com/numpy/numpy/issues/14020) with a request for a
function to generate a gufunc from a Python function.

numba has the @guvectorize decorator, but I haven't use it much, and
in my few quick attempts just now, it appeared to not accept fixed
integer sizes in the output shape.  But wait to see if any numba gurus
respond with a definitive answer about whether or not it can handle
the shape signature (),()->(),(3),(3).

You could implement the gufunc in a C or C++ extension module, if you
don't mind the additional development effort and packaging hassle.  I
know that works--I've implemented quite a few gufuncs in ufunclab
(https://github.com/WarrenWeckesser/ufunclab).

Warren


>
> Thanks in advance,
> Ben Root
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: warren.weckes...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: arch...@mail-archive.com