[Numpy-discussion] Formally accept NEP 51: Changing the Representation of NumPy Scalars

2022-11-25 Thread Sebastian Berg
Hi all,

I would like to formally propose accepting NEP 51.  Without any concern
voiced, we will consider it accepted within 7 days.

As a reminder, this is to change the representation of NumPy scalars to
be consistent and include the type name.
That means the following representations:

np.float64(6.4) ->  np.float64(6.4)
np.float64(np.nan)  ->  np.float64(nan)

rather than just `6.4` or `nan`.  All scalars would follow this exact
pattern of `np.(value)`.

There are some further details, for these please check the full NEP:

https://numpy.org/neps/nep-0051-scalar-representation.html


For those interested in more details, a few notes:

* To implement the NEP, we need to update NumPy docs.  I plan to
  automate this (mostly) and such automation should also help others.
  (I will make a brief note of this in the NEP.)
  Help with this automation would be greatly appreciated, since this is
  its own project.

* I am not sure that the underscored versions `np.str_` and `np.bool_`
  will be the correct names for long.  If we adjust them, then this
  would propagate to the NEP.

* There are a few implementation details in the NEP, I don't mind
  adjusting them.  But do wish to be pragmatic about progressing if
  there is no clearly formulated alternative.

* Clearly we can always adjust the printing conventions, e.g. whether
  to include the `np.` or whether NaN's should be `np.float64(nan)` or
  not.  But bike-sheds happening now have a much better chance of
  being heard :).

1. The current NEP states that we use `np.str_` and `np.bytes_`.  There
is some chance that the top-level names could be changed, in that case
the representation would change accordingly.  (I consider this an
adjustment we can do without the NEP.)

2. To properly implement the NEP, we need to automate some of the
documentation changes necessary.  This should also enable downstream to
do the same or at least have a blueprint as a starting point.
(Help with this work is greatly appreciated, since it is its own small
project to hook into the doctest utilities.)

I plan on adding a brief note on about helping with doc updates to NEP
when accepting it.  Ross was planning to add a table of changed
examples, although I don't think that is necessary for accepting.

Cheers,

Sebastian



On Fri, 2022-10-28 at 10:54 +0200, Sebastian Berg wrote:
> Hi all,
> 
> As mentioned earlier, I would like to propose changing the
> representation of scalars in NumPy.  Discussion and ideas on changes
> are much appreciated!
> 
> The main change is to show scalars as:
> 
> * `np.float64(3.0)`  ­instead of just `3.0`
> * `np.True_` instead of `True`
> * `np.void((3, 5), dtype=[('a', '   `(3, 5)`
> * Use `np.` rather than `numpy.` for datetime/timedelta.
> 
> This way it is clear for users that they are dealing with NumPy
> scalars
> which behave different from Python scalars.
> The `str()` that is given when using `print()` and the way arrays are
> shown will be unchanged.
> 
> The NEP draft can be found here:
> 
>     https://numpy.org/neps/nep-0051-scalar-representation.html
> 
> and it includes more details and related changes.
> 
> The implementation is largely finished and can be found here:
> 
>    https://github.com/numpy/numpy/pull/22449
> 
> W are fairly late in the release cycle and the change should not
> block
> other things.  So, the aim is to merge it early in the next release
> cycle.  That way downstream has time to fix documentation is wanted.
> 
> Depending on how discussion goes, I hope to formally propose the NEP
> fairly soon, so that the merging the implementation doesn't need to
> wait on NEP approval.
> 
> Cheers,
> 
> Sebastian
> 
> 
> 
> 
> On Thu, 2022-09-08 at 11:38 +0200, Sebastian Berg wrote:
> > 
> > TL;DR:  NumPy scalars representation is e.g. `34.3` instead of
> > `float32(34.3)`.  So the representation is missing the type
> > information.  What are your thoughts on changing that?
> > 
> > 
> > Hi all,
> > 
> > I am thinking about the next steps for NEP 50 (The NEP wants to fix
> > the
> > NumPy promotion rules, especially with respect to scalars):
> > 
> >     https://numpy.org/neps/nep-0050-scalar-promotion.html
> > 
> > In relation to that, there was one point that Stéfan brought up
> > previously.
> > 
> > The NumPy scalars (representation) currently print as numbers:
> > 
> >     >>> np.float32(34.3)
> >     34.3
> >     >>> np.uint8(5)
> >     5
> > 
> > That can already be confusing now.  However, it gets more
> > problematic
> > if NEP 50 is introduced since the behavior between a Python `34.3`
> > and
> > `np.float32(34.3)` would differ more than it does now (please refer
> > to
> > the NEP).
> > 
> > The change would be that we should print as:
> > 
> >     float64(34.3)  (or similar?)
> > 
> > This Email is mainly to ask for any feedback or concern on such a
> > change.  I suspect we may have to write a very brief NEP about it.
> > 
> > If there is little concern, maybe we could move forward such a
> > 

[Numpy-discussion] Re: Adding bit_count ufunc

2022-11-25 Thread Sebastian Berg
Thanks for bringing this up again.  The Python method exists and it
seems like relatively basic functionality.

Overall, I am slightly in favor of adding the ufunc.  So if nobody
voices an opinion that it doesn't seem a good fit for NumPy, I would be
happy to move forward with it.

- Sebastian


PS: One of my main concern would be if we were to add many bitwise
functions, in which case a `bitwise` namespace might be nice.  But I am
not convinced that should stop us here.


On Thu, 2022-11-24 at 15:56 +, Doug Turnbull wrote:
> 👋Long time numpy user, and big fan of all your work
> 
> TL; DR - there's a `bit_count` method on numpy scalars, and Python
> ints, I'm advocating for a `ufunc` `bit_count` as has been
> implemented in this PR:
> 
> https://github.com/numpy/numpy/pull/21429
> 
> A number of people have requested this as a numpy feature and there's
> a lot of great discussion at this issue
> 
> https://github.com/numpy/numpy/issues/16325
> 
> A bit count comes up in certain similarity situations (like hamming
> distance). This involves an xor with numpy arrays and a bit count,
> where the bit count is currently the bottleneck by a few orders of
> magnitude in my local benchmarking. Currently there's a number of not
> particularly fast, workarounds for doing this with numpy arrays, like
> the bit-twiddling solutions here
> 
> https://stackoverflow.com/a/68943135/8123
> https://stackoverflow.com/a/109025/8123
> 
> So I'd love to be able to continue the work of bit_count -> numpy
> array ufunc to get performance gains at the array level 🙏
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding bit_count ufunc

2022-11-25 Thread Serge Guelton
On Fri, Nov 25, 2022 at 08:09:02PM +0100, Sebastian Berg wrote:
> Thanks for bringing this up again.  The Python method exists and it
> seems like relatively basic functionality.
> 
> Overall, I am slightly in favor of adding the ufunc.  So if nobody
> voices an opinion that it doesn't seem a good fit for NumPy, I would be
> happy to move forward with it.
> 
> - Sebastian
> 
> 
> PS: One of my main concern would be if we were to add many bitwise
> functions, in which case a `bitwise` namespace might be nice.  But I am
> not convinced that should stop us here.

Technically speaking, bitwise_and, birwise_or, bitwise_xor and bitwise_not
already exist and popcount is widely spread, it already has its compiler builtin
under the name of __builtin_popcount
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Questions on contributing

2022-11-25 Thread Kode D. Creer
Hi. I would like to help contribute with the code.  I would like to know where 
to get started. There are a couple features in mind that I feel is missing and 
would be helpful.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Questions on contributing

2022-11-25 Thread Inessa Pawson
Thank you for your interest in contributing to NumPy, Kode!
Here is a quick overview of all the ways one can contribute to the project:
https://numpy.org/contribute/
If you’d like to focus on the source code, have a look at the issues that
are up to 12 months old in the issue tracker (
https://github.com/numpy/numpy/issues) and see if there is anything you
could fix. The ones labeled “documentation” and “bug” would be the best
place to start. With bugs, reproducing a bug and documenting it in a
comment (below the issue) would be of tremendous help. For general Python
development in NumPy, modules like `numpy.ma` and `numpy.polynomial` are
pure Python and are fairly approachable.

On Fri, Nov 25, 2022 at 9:35 PM Kode D. Creer  wrote:

> Hi. I would like to help contribute with the code.  I would like to know
> where to get started. There are a couple features in mind that I feel is
> missing and would be helpful.
>

-- 
Cheers,
Inessa

Inessa Pawson
Contributor Experience Lead | NumPy
https://numpy.org/
GitHub: inessapawson
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com