Re: [Numpy-discussion] Inclusion of licenses
On Thu, Dec 10, 2020 at 6:42 AM Charles R Harris wrote: > > > On Wed, Dec 9, 2020 at 5:17 PM Charles R Harris > wrote: > >> Hi All, >> >> Currently we append appropriate platform licenses to the LICENSE.txt >> file when building wheels for release. This means that there are >> uncommitted changes which shows up in the versioneer version as 'dirty', >> see the nightly files. This is unfortunate, but accurate :) There are at >> least two possible solutions to this problem. >> >>1. Patch versioneer to omit the dirty string, very easy to do. >>2. Put the platform specific file in the repo or combine them in the >>LICENSE file. >> >> I don't recall why we did things the way we do, but there was a >> discussion. Patching is easy, but the second option seems preferable. In >> particular, folks who now build their own NumPy wheels aren't going to have >> the license files. >> > The reason for that construct is that GitHub won't recognize the license if we add vendored info. As a result, it would not only not display the license in its UI, but also it provides an API to query the license for a repo which then gives the wrong result. That in turn throws off Tidelift, which uses two sources of licensing info in its service (GitHub and libraries.io) and those should match. Please consider this an issue with versioneer, and choose (1) Note that LICENSES_bundled.txt, excluded from the sdist in MANIFEST.in, is > included in the wheel in the dist-info file. > Ah, that needs fixing then. Cheers, Ralf > charris@fc [numpy.git (master)]$ ls > dist/numpy-1.21.0.dev0+135.g26f8b11b6e.dist-info > entry_points.txt LICENSES_bundled.txt LICENSE.txt METADATA RECORD > top_level.txt WHEEL > > Looks like any LICENSE* files in the root directory will be included in > the wheel. > > Chuck > > > ___ > NumPy-Discussion mailing list > [email protected] > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Inclusion of licenses
On Thu, Dec 10, 2020 at 2:35 AM Ralf Gommers wrote: > > > On Thu, Dec 10, 2020 at 6:42 AM Charles R Harris < > [email protected]> wrote: > >> >> >> On Wed, Dec 9, 2020 at 5:17 PM Charles R Harris < >> [email protected]> wrote: >> >>> Hi All, >>> >>> Currently we append appropriate platform licenses to the LICENSE.txt >>> file when building wheels for release. This means that there are >>> uncommitted changes which shows up in the versioneer version as 'dirty', >>> see the nightly files. This is unfortunate, but accurate :) There are at >>> least two possible solutions to this problem. >>> >>>1. Patch versioneer to omit the dirty string, very easy to do. >>>2. Put the platform specific file in the repo or combine them in the >>>LICENSE file. >>> >>> I don't recall why we did things the way we do, but there was a >>> discussion. Patching is easy, but the second option seems preferable. In >>> particular, folks who now build their own NumPy wheels aren't going to have >>> the license files. >>> >> > The reason for that construct is that GitHub won't recognize the license > if we add vendored info. As a result, it would not only not display the > license in its UI, but also it provides an API to query the license for a > repo which then gives the wrong result. That in turn throws off Tidelift, > which uses two sources of licensing info in its service (GitHub and > libraries.io) and those should match. > > Please consider this an issue with versioneer, and choose (1) > > Note that LICENSES_bundled.txt, excluded from the sdist in MANIFEST.in, is >> included in the wheel in the dist-info file. >> > > Ah, that needs fixing then. > > Seems setup can be called with an option to use MANIFEST.in, I'll experiment a bit. Since the bundled license is only included in `dist-info` it may also be a bug in setuptools. Chuck ___ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.{bool,float,int} deprecation
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote: > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer > wrote: > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg > > wrote: > > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > > > Regarding np.bool specifically, if you want to deprecate this, > > > > you > > > > might want to discuss this with us at the array API standard > > > > https://github.com/data-apis/array-api (which is currently in > > > > RFC > > > > stage). The spec uses bool as the name for the boolean dtype. > > > > > > > > Would it make sense for NumPy to change np.bool to just be the > > > > boolean > > > > dtype object? Unlike int and float, there is no ambiguity with > > > > bool, > > > > and NumPy clearly doesn't have any issues with shadowing > > > > builtin > > > > names > > > > in its namespace. > > > > > > We could keep the Python alias around (which for `dtype=` is the > > > same > > > as `np.bool_`). > > > > > > I am not sure I like the idea of immediately shadowing the > > > builtin. > > > That is a switch we can avoid flipping (without warning); > > > `np.bool_` > > > and `bool` are fairly different beasts? [1] > > > > NumPy already shadows a lot of builtins, in many cases, in ways > > that > > are incompatible with existing ones. It's not something I would > > have > > done personally, but it's been this way for a long time. > > > > It may be defensible to keep np.bool as an alias for Python's bool > even > when we remove the other aliases. That is true, `int` is probably the most confusing, since it is not at all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently). So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility? Right now, my main take-away from the discussion is that it would be good to clarify the release notes a bit more. Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`? Cheers, Sebastian > > np.int_ and np.float_ have fixed precision, which makes them somewhat > different from the builtin types. NumPy has a whole bunch of > different > precisions for integer and floats, so this distinction matters. > > In contrast, there is only one boolean dtype in NumPy, which matches > Python's bool. So we wouldn't have to worry, for example, about > whether a > user has requested a specific precision explicitly. This comes up in > issues > like type-promotion where libraries like JAX and PyTorch have special > case > logic for most Python types vs NumPy dtypes (but booleans are the > same for > both): > https://jax.readthedocs.io/en/latest/type_promotion.html > > > > > > > Aaron Meurer > > > > > OTOH, if someone wants to entertain switching... It could be > > > interesting to see how (unfixed) downstream projects react to it. > > > > > > One approach would be: > > > > > > * Go ahead for now (deprecate) > > > * Add a FutureWarning at some point that we _will_ start to > > > export > > > `np.bool` again (but `from numpy import *` is a problem?) > > > * Aim to make `np.bool is np.bool_` at some point in the (far) > > > future. > > > > > > It is multi-step (and I recall opinions that multi-step is bad). > > > Although, I think the main argument against it was to not force > > > users > > > to modify code more than once. And I do not think that happens > > > here. > > > > > > Of course we could use the `FutureWarning` right away, but I > > > don't mind > > > taking it slow. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > > > [1] I admit, probably almost nobody would notice. And usually > > > using a > > > Python `bool` is better... > > > > > > > > > > > > > > Aaron Meurer > > > > > > > > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias < > > > > [email protected]> > > > > wrote: > > > > > Hi all, > > > > > > > > > > At the prodding [1] of Sebastian, I’m starting a discussion > > > > > on the > > > > > decision to deprecate np.{bool,float,int}. This deprecation > > > > > broke > > > > > our prerelease testing in scikit-image (which, hooray for > > > > > rcs!), > > > > > and resulted in a large amount of code churn to fix [2]. > > > > > > > > > > To be honest, I do think *some* sort of deprecation is > > > > > needed, > > > > > because for the longest time I thought that np.float was what > > > > > np.float_ actually is. I think it would be worthwhile to move > > > > > to > > > > > *that*, though it’s an even more invasive deprecation than > > > > > the > > > > > currently proposed one. Writing `x = np.zeros(5, dtype=int)` > > > > > is > > > > > somewhat magical, because someone with a strict typing > > > > > mindset > > > > > (there’s an increasing number!) might expect that t
Re: [Numpy-discussion] np.{bool,float,int} deprecation
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg wrote: > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote: > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer > > wrote: > > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg > > > wrote: > > > > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > > > > Regarding np.bool specifically, if you want to deprecate this, > > > > > you > > > > > might want to discuss this with us at the array API standard > > > > > https://github.com/data-apis/array-api (which is currently in > > > > > RFC > > > > > stage). The spec uses bool as the name for the boolean dtype. > > > > > > > > > > Would it make sense for NumPy to change np.bool to just be the > > > > > boolean > > > > > dtype object? Unlike int and float, there is no ambiguity with > > > > > bool, > > > > > and NumPy clearly doesn't have any issues with shadowing > > > > > builtin > > > > > names > > > > > in its namespace. > > > > > > > > We could keep the Python alias around (which for `dtype=` is the > > > > same > > > > as `np.bool_`). > > > > > > > > I am not sure I like the idea of immediately shadowing the > > > > builtin. > > > > That is a switch we can avoid flipping (without warning); > > > > `np.bool_` > > > > and `bool` are fairly different beasts? [1] > > > > > > NumPy already shadows a lot of builtins, in many cases, in ways > > > that > > > are incompatible with existing ones. It's not something I would > > > have > > > done personally, but it's been this way for a long time. > > > > > > > It may be defensible to keep np.bool as an alias for Python's bool > > even when we remove the other aliases. > I'd agree with that. > That is true, `int` is probably the most confusing, since it is not at > all compatible to a Python integer, but rather the "default" integer > (which happens to be the same as C `long` currently). > > So we could focus on `np.int`, `np.long`. I am a bit unsure whether > you would prefer that or are mainly pointing out the possibility? > Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise. Right now, my main take-away from the discussion is that it would be > good to clarify the release notes a bit more. > > Using `float` for a dtype seems fine to me, but I prefer mentioning > `np.float64` over `np.float_`. > For integers, I wonder if we should also suggest `np.int64`, even – or > because – if the default integer on many systems is currently > `np.int_`? > I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense. The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs. Cheers, Ralf > > > > > np.int_ and np.float_ have fixed precision, which makes them somewhat > > different from the builtin types. NumPy has a whole bunch of > > different > > precisions for integer and floats, so this distinction matters. > > > > In contrast, there is only one boolean dtype in NumPy, which matches > > Python's bool. So we wouldn't have to worry, for example, about > > whether a > > user has requested a specific precision explicitly. This comes up in > > issues > > like type-promotion where libraries like JAX and PyTorch have special > > case > > logic for most Python types vs NumPy dtypes (but booleans are the > > same for > > both): > > https://jax.readthedocs.io/en/latest/type_promotion.html > > ___ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.{bool,float,int} deprecation
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote: > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < > [email protected]> > wrote: > > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote: > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer > > > wrote: > > > > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg > > > > wrote: > > > > > > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > > > > > Regarding np.bool specifically, if you want to deprecate > > > > > > this, > > > > > > you > > > > > > might want to discuss this with us at the array API > > > > > > standard > > > > > > https://github.com/data-apis/array-api (which is currently > > > > > > in > > > > > > RFC > > > > > > stage). The spec uses bool as the name for the boolean > > > > > > dtype. > > > > > > > > > > > > Would it make sense for NumPy to change np.bool to just be > > > > > > the > > > > > > boolean > > > > > > dtype object? Unlike int and float, there is no ambiguity > > > > > > with > > > > > > bool, > > > > > > and NumPy clearly doesn't have any issues with shadowing > > > > > > builtin > > > > > > names > > > > > > in its namespace. > > > > > > > > > > We could keep the Python alias around (which for `dtype=` is > > > > > the > > > > > same > > > > > as `np.bool_`). > > > > > > > > > > I am not sure I like the idea of immediately shadowing the > > > > > builtin. > > > > > That is a switch we can avoid flipping (without warning); > > > > > `np.bool_` > > > > > and `bool` are fairly different beasts? [1] > > > > > > > > NumPy already shadows a lot of builtins, in many cases, in ways > > > > that > > > > are incompatible with existing ones. It's not something I would > > > > have > > > > done personally, but it's been this way for a long time. > > > > > > > > > > It may be defensible to keep np.bool as an alias for Python's > > > bool > > > even when we remove the other aliases. > > > > I'd agree with that. > > > > That is true, `int` is probably the most confusing, since it is not > > at > > all compatible to a Python integer, but rather the "default" > > integer > > (which happens to be the same as C `long` currently). > > > > So we could focus on `np.int`, `np.long`. I am a bit unsure > > whether > > you would prefer that or are mainly pointing out the possibility? > > > > Not sure what you mean with focus, focus on describing in the release > notes? Deprecating `np.int` seems like the most beneficial part of > this > whole exercise. > I meant limiting the current deprecation to `np.int`, maybe `np.long`, and a "carefully chosen" set. To be honest, I don't mind either way, so any stronger opinion will tip the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names). There are probably more doc updates that would be nice, I will suggest updating a separate issue for that. > Right now, my main take-away from the discussion is that it would be > > good to clarify the release notes a bit more. > > > > Using `float` for a dtype seems fine to me, but I prefer mentioning > > `np.float64` over `np.float_`. > > For integers, I wonder if we should also suggest `np.int64`, even – > > or > > because – if the default integer on many systems is currently > > `np.int_`? > > > > I agree. I think we should recommend sane, descriptive names that do > the > right thing. So ideally we'd have people spell their dtype specifiers > as > dtype=bool # or np.bool > dtype=np.float64 > dtype=np.int64 > dtype=np.complex128 > The names with underscores at the end make little sense from a UX > perspective. And the C equivalents (single/double/etc) made sense 15 > years > ago, but with the user base of today - the majority of whom will not > know C > fluently or at all - also don't make too much sense. > > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and > 64 > bits is likely to be a pitfall much more often than it is what the > user > actually needs, so shouldn't be recommended and probably deserves a > warning > in the docs. Right, there is one slight trickery because `np.intp` is often a great integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.) Cheers, Sebastian > > Cheers, > Ralf > > > > > > > > > > np.int_ and np.float_ have fixed precision, which makes them > > > somewhat > > > different from the builtin types. NumPy has a whole bunch of > > > different > > > precisions for integer and floats, so this distinction matters. > > > > > > In contrast, there is only one boolean dtype in NumPy, which > > > matches > > > Python's bool. So we wouldn't have to worry, for example, about > > > whether a > > > user has requested a specific precision explicitly. This comes up > > > in > > > issues > > > like type-promotion where libraries like JAX and PyTorch have > > > s
