Re: [Numpy-discussion] Inclusion of licenses

2020-12-10 Thread Ralf Gommers
On Thu, Dec 10, 2020 at 6:42 AM Charles R Harris 
wrote:

>
>
> On Wed, Dec 9, 2020 at 5:17 PM Charles R Harris 
> wrote:
>
>> Hi All,
>>
>> Currently we append appropriate platform  licenses to the LICENSE.txt
>> file when building wheels for release. This means that there are
>> uncommitted changes which shows up in the versioneer version as 'dirty',
>> see the nightly files. This is unfortunate, but accurate :) There are at
>> least two possible solutions to this problem.
>>
>>1. Patch versioneer to omit the dirty string, very easy to do.
>>2. Put the platform specific file in the repo or combine them in the
>>LICENSE file.
>>
>> I don't recall why we did things the way we do, but there was a
>> discussion. Patching is easy, but the second option seems preferable. In
>> particular, folks who now build their own NumPy wheels aren't going to have
>> the license files.
>>
>
The reason for that construct is that GitHub won't recognize the license if
we add vendored info. As a result, it would not only not display the
license in its UI, but also it provides an API to query the license for a
repo which then gives the wrong result. That in turn throws off Tidelift,
which uses two sources of licensing info in its service (GitHub and
libraries.io) and those should match.

Please consider this an issue with versioneer, and choose (1)

Note that LICENSES_bundled.txt, excluded from the sdist in MANIFEST.in, is
> included in the wheel in the dist-info file.
>

Ah, that needs fixing then.

Cheers,
Ralf


> charris@fc [numpy.git (master)]$ ls
> dist/numpy-1.21.0.dev0+135.g26f8b11b6e.dist-info
> entry_points.txt  LICENSES_bundled.txt  LICENSE.txt  METADATA  RECORD
>  top_level.txt  WHEEL
>
> Looks like any LICENSE* files in the root directory will be included in
> the wheel.
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Inclusion of licenses

2020-12-10 Thread Charles R Harris
On Thu, Dec 10, 2020 at 2:35 AM Ralf Gommers  wrote:

>
>
> On Thu, Dec 10, 2020 at 6:42 AM Charles R Harris <
> [email protected]> wrote:
>
>>
>>
>> On Wed, Dec 9, 2020 at 5:17 PM Charles R Harris <
>> [email protected]> wrote:
>>
>>> Hi All,
>>>
>>> Currently we append appropriate platform  licenses to the LICENSE.txt
>>> file when building wheels for release. This means that there are
>>> uncommitted changes which shows up in the versioneer version as 'dirty',
>>> see the nightly files. This is unfortunate, but accurate :) There are at
>>> least two possible solutions to this problem.
>>>
>>>1. Patch versioneer to omit the dirty string, very easy to do.
>>>2. Put the platform specific file in the repo or combine them in the
>>>LICENSE file.
>>>
>>> I don't recall why we did things the way we do, but there was a
>>> discussion. Patching is easy, but the second option seems preferable. In
>>> particular, folks who now build their own NumPy wheels aren't going to have
>>> the license files.
>>>
>>
> The reason for that construct is that GitHub won't recognize the license
> if we add vendored info. As a result, it would not only not display the
> license in its UI, but also it provides an API to query the license for a
> repo which then gives the wrong result. That in turn throws off Tidelift,
> which uses two sources of licensing info in its service (GitHub and
> libraries.io) and those should match.
>
> Please consider this an issue with versioneer, and choose (1)
>
> Note that LICENSES_bundled.txt, excluded from the sdist in MANIFEST.in, is
>> included in the wheel in the dist-info file.
>>
>
> Ah, that needs fixing then.
>
>
Seems setup can be called with an option to use MANIFEST.in, I'll
experiment a bit. Since the bundled license is only included in `dist-info`
it may also be a bug in setuptools.

Chuck
___
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-10 Thread Sebastian Berg
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
> wrote:
> 
> > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> >  wrote:
> > > 
> > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > Regarding np.bool specifically, if you want to deprecate this,
> > > > you
> > > > might want to discuss this with us at the array API standard
> > > > https://github.com/data-apis/array-api (which is currently in
> > > > RFC
> > > > stage). The spec uses bool as the name for the boolean dtype.
> > > > 
> > > > Would it make sense for NumPy to change np.bool to just be the
> > > > boolean
> > > > dtype object? Unlike int and float, there is no ambiguity with
> > > > bool,
> > > > and NumPy clearly doesn't have any issues with shadowing
> > > > builtin
> > > > names
> > > > in its namespace.
> > > 
> > > We could keep the Python alias around (which for `dtype=` is the
> > > same
> > > as `np.bool_`).
> > > 
> > > I am not sure I like the idea of immediately shadowing the
> > > builtin.
> > > That is a switch we can avoid flipping (without warning);
> > > `np.bool_`
> > > and `bool` are fairly different beasts? [1]
> > 
> > NumPy already shadows a lot of builtins, in many cases, in ways
> > that
> > are incompatible with existing ones. It's not something I would
> > have
> > done personally, but it's been this way for a long time.
> > 
> 
> It may be defensible to keep np.bool as an alias for Python's bool
> even
> when we remove the other aliases.

That is true, `int` is probably the most confusing, since it is not at
all compatible to a Python integer, but rather the "default" integer
(which happens to be the same as C `long` currently).

So we could focus on `np.int`, `np.long`.  I am a bit unsure whether
you would prefer that or are mainly pointing out the possibility?


Right now, my main take-away from the discussion is that it would be
good to clarify the release notes a bit more.

Using `float` for a dtype seems fine to me, but I prefer mentioning
`np.float64` over `np.float_`.
For integers, I wonder if we should also suggest `np.int64`, even – or
because – if the default integer on many systems is currently
`np.int_`?

Cheers,

Sebastian



> 
> np.int_ and np.float_ have fixed precision, which makes them somewhat
> different from the builtin types. NumPy has a whole bunch of
> different
> precisions for integer and floats, so this distinction matters.
> 
> In contrast, there is only one boolean dtype in NumPy, which matches
> Python's bool. So we wouldn't have to worry, for example, about
> whether a
> user has requested a specific precision explicitly. This comes up in
> issues
> like type-promotion where libraries like JAX and PyTorch have special
> case
> logic for most Python types vs NumPy dtypes (but booleans are the
> same for
> both):
> https://jax.readthedocs.io/en/latest/type_promotion.html
> 
> 
> 
> > 
> > Aaron Meurer
> > 
> > > OTOH, if someone wants to entertain switching... It could be
> > > interesting to see how (unfixed) downstream projects react to it.
> > > 
> > > One approach would be:
> > > 
> > > * Go ahead for now (deprecate)
> > > * Add a FutureWarning at some point that we _will_ start to
> > > export
> > >   `np.bool` again (but `from numpy import *` is a problem?)
> > > * Aim to make `np.bool is np.bool_` at some point in the (far)
> > > future.
> > > 
> > > It is multi-step (and I recall opinions that multi-step is bad).
> > > Although, I think the main argument against it was to not force
> > > users
> > > to modify code more than once.  And I do not think that happens
> > > here.
> > > 
> > > Of course we could use the `FutureWarning` right away, but I
> > > don't mind
> > > taking it slow.
> > > 
> > > Cheers,
> > > 
> > > Sebastian
> > > 
> > > 
> > > 
> > > [1] I admit, probably almost nobody would notice. And usually
> > > using a
> > > Python `bool` is better...
> > > 
> > > 
> > > > 
> > > > Aaron Meurer
> > > > 
> > > > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <
> > > > [email protected]>
> > > > wrote:
> > > > > Hi all,
> > > > > 
> > > > > At the prodding [1] of Sebastian, I’m starting a discussion
> > > > > on the
> > > > > decision to deprecate np.{bool,float,int}. This deprecation
> > > > > broke
> > > > > our prerelease testing in scikit-image (which, hooray for
> > > > > rcs!),
> > > > > and resulted in a large amount of code churn to fix [2].
> > > > > 
> > > > > To be honest, I do think *some* sort of deprecation is
> > > > > needed,
> > > > > because for the longest time I thought that np.float was what
> > > > > np.float_ actually is. I think it would be worthwhile to move
> > > > > to
> > > > > *that*, though it’s an even more invasive deprecation than
> > > > > the
> > > > > currently proposed one. Writing `x = np.zeros(5, dtype=int)`
> > > > > is
> > > > > somewhat magical, because someone with a strict typing
> > > > > mindset
> > > > > (there’s an increasing number!) might expect that t

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-10 Thread Ralf Gommers
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg 
wrote:

> On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
> > wrote:
> >
> > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> > >  wrote:
> > > >
> > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > > Regarding np.bool specifically, if you want to deprecate this,
> > > > > you
> > > > > might want to discuss this with us at the array API standard
> > > > > https://github.com/data-apis/array-api (which is currently in
> > > > > RFC
> > > > > stage). The spec uses bool as the name for the boolean dtype.
> > > > >
> > > > > Would it make sense for NumPy to change np.bool to just be the
> > > > > boolean
> > > > > dtype object? Unlike int and float, there is no ambiguity with
> > > > > bool,
> > > > > and NumPy clearly doesn't have any issues with shadowing
> > > > > builtin
> > > > > names
> > > > > in its namespace.
> > > >
> > > > We could keep the Python alias around (which for `dtype=` is the
> > > > same
> > > > as `np.bool_`).
> > > >
> > > > I am not sure I like the idea of immediately shadowing the
> > > > builtin.
> > > > That is a switch we can avoid flipping (without warning);
> > > > `np.bool_`
> > > > and `bool` are fairly different beasts? [1]
> > >
> > > NumPy already shadows a lot of builtins, in many cases, in ways
> > > that
> > > are incompatible with existing ones. It's not something I would
> > > have
> > > done personally, but it's been this way for a long time.
> > >
> >
> > It may be defensible to keep np.bool as an alias for Python's bool
> > even when we remove the other aliases.
>

I'd agree with that.


> That is true, `int` is probably the most confusing, since it is not at
> all compatible to a Python integer, but rather the "default" integer
> (which happens to be the same as C `long` currently).
>
> So we could focus on `np.int`, `np.long`.  I am a bit unsure whether
> you would prefer that or are mainly pointing out the possibility?
>

Not sure what you mean with focus, focus on describing in the release
notes? Deprecating `np.int` seems like the most beneficial part of this
whole exercise.

Right now, my main take-away from the discussion is that it would be
> good to clarify the release notes a bit more.
>
> Using `float` for a dtype seems fine to me, but I prefer mentioning
> `np.float64` over `np.float_`.
> For integers, I wonder if we should also suggest `np.int64`, even – or
> because – if the default integer on many systems is currently
> `np.int_`?
>

I agree. I think we should recommend sane, descriptive names that do the
right thing. So ideally we'd have people spell their dtype specifiers as
  dtype=bool  # or np.bool
  dtype=np.float64
  dtype=np.int64
  dtype=np.complex128
The names with underscores at the end make little sense from a UX
perspective. And the C equivalents (single/double/etc) made sense 15 years
ago, but with the user base of today - the majority of whom will not know C
fluently or at all - also don't make too much sense.

The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64
bits is likely to be a pitfall much more often than it is what the user
actually needs, so shouldn't be recommended and probably deserves a warning
in the docs.

Cheers,
Ralf


>
> >
> > np.int_ and np.float_ have fixed precision, which makes them somewhat
> > different from the builtin types. NumPy has a whole bunch of
> > different
> > precisions for integer and floats, so this distinction matters.
> >
> > In contrast, there is only one boolean dtype in NumPy, which matches
> > Python's bool. So we wouldn't have to worry, for example, about
> > whether a
> > user has requested a specific precision explicitly. This comes up in
> > issues
> > like type-promotion where libraries like JAX and PyTorch have special
> > case
> > logic for most Python types vs NumPy dtypes (but booleans are the
> > same for
> > both):
> > https://jax.readthedocs.io/en/latest/type_promotion.html
>
>
___
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-10 Thread Sebastian Berg
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
> On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
> [email protected]>
> wrote:
> 
> > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
> > > wrote:
> > > 
> > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> > > >  wrote:
> > > > > 
> > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > > > Regarding np.bool specifically, if you want to deprecate
> > > > > > this,
> > > > > > you
> > > > > > might want to discuss this with us at the array API
> > > > > > standard
> > > > > > https://github.com/data-apis/array-api (which is currently
> > > > > > in
> > > > > > RFC
> > > > > > stage). The spec uses bool as the name for the boolean
> > > > > > dtype.
> > > > > > 
> > > > > > Would it make sense for NumPy to change np.bool to just be
> > > > > > the
> > > > > > boolean
> > > > > > dtype object? Unlike int and float, there is no ambiguity
> > > > > > with
> > > > > > bool,
> > > > > > and NumPy clearly doesn't have any issues with shadowing
> > > > > > builtin
> > > > > > names
> > > > > > in its namespace.
> > > > > 
> > > > > We could keep the Python alias around (which for `dtype=` is
> > > > > the
> > > > > same
> > > > > as `np.bool_`).
> > > > > 
> > > > > I am not sure I like the idea of immediately shadowing the
> > > > > builtin.
> > > > > That is a switch we can avoid flipping (without warning);
> > > > > `np.bool_`
> > > > > and `bool` are fairly different beasts? [1]
> > > > 
> > > > NumPy already shadows a lot of builtins, in many cases, in ways
> > > > that
> > > > are incompatible with existing ones. It's not something I would
> > > > have
> > > > done personally, but it's been this way for a long time.
> > > > 
> > > 
> > > It may be defensible to keep np.bool as an alias for Python's
> > > bool
> > > even when we remove the other aliases.
> > 
> 
> I'd agree with that.
> 
> 
> > That is true, `int` is probably the most confusing, since it is not
> > at
> > all compatible to a Python integer, but rather the "default"
> > integer
> > (which happens to be the same as C `long` currently).
> > 
> > So we could focus on `np.int`, `np.long`.  I am a bit unsure
> > whether
> > you would prefer that or are mainly pointing out the possibility?
> > 
> 
> Not sure what you mean with focus, focus on describing in the release
> notes? Deprecating `np.int` seems like the most beneficial part of
> this
> whole exercise.
> 

I meant limiting the current deprecation to `np.int`, maybe `np.long`,
and a "carefully chosen" set.
To be honest, I don't mind either way, so any stronger opinion will tip
the scale for me personally (my default currently is to update the
release notes to recommend the more descriptive names).

There are probably more doc updates that would be nice, I will suggest
updating a separate issue for that.


> Right now, my main take-away from the discussion is that it would be
> > good to clarify the release notes a bit more.
> > 
> > Using `float` for a dtype seems fine to me, but I prefer mentioning
> > `np.float64` over `np.float_`.
> > For integers, I wonder if we should also suggest `np.int64`, even –
> > or
> > because – if the default integer on many systems is currently
> > `np.int_`?
> > 
> 
> I agree. I think we should recommend sane, descriptive names that do
> the
> right thing. So ideally we'd have people spell their dtype specifiers
> as
>   dtype=bool  # or np.bool
>   dtype=np.float64
>   dtype=np.int64
>   dtype=np.complex128
> The names with underscores at the end make little sense from a UX
> perspective. And the C equivalents (single/double/etc) made sense 15
> years
> ago, but with the user base of today - the majority of whom will not
> know C
> fluently or at all - also don't make too much sense.
> 
> The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and
> 64
> bits is likely to be a pitfall much more often than it is what the
> user
> actually needs, so shouldn't be recommended and probably deserves a
> warning
> in the docs.

Right, there is one slight trickery because `np.intp` is often a great
integer dtype to use, because it is the integer that NumPy uses for all
things related to indexing and array sizes.
(I would be happy to dig out my PR making `np.intp` the default NumPy
integer.)

Cheers,

Sebastian


> 
> Cheers,
> Ralf
> 
> 
> > 
> > > 
> > > np.int_ and np.float_ have fixed precision, which makes them
> > > somewhat
> > > different from the builtin types. NumPy has a whole bunch of
> > > different
> > > precisions for integer and floats, so this distinction matters.
> > > 
> > > In contrast, there is only one boolean dtype in NumPy, which
> > > matches
> > > Python's bool. So we wouldn't have to worry, for example, about
> > > whether a
> > > user has requested a specific precision explicitly. This comes up
> > > in
> > > issues
> > > like type-promotion where libraries like JAX and PyTorch have
> > > s