[Numpy-discussion] Arbitrarily large random integers

2023-08-19 Thread Dan Schult
How can we use numpy's random `integers` function to get uniformly selected 
integers from an arbitrarily large `high` limit? This is important when dealing 
with exact probabilities in combinatorially large solution spaces. 

I propose that we add the capability for `integers` to construct arrays of type 
object_ by having it construct python int's as the objects in the returned 
array. This would allow arbitrarily large integers.
 
The Python random library's `randrange` constructs values for arbitrary upper 
limits -- and they are exact when using subclasses of `random.Random` with a 
`getrandbits` methods (which includes the default rng for most operating 
systems).

Numpy's random `integers` function rightfully raises on `integers(20**20, 
dtype=int64)` because the upper limit is above what can be held in an `int64`. 
But Python `int` objects store arbitrarily large integers. So I would expect 
`integers(20**20, dtype=object)` to create random integers on the desired 
range. Instead a TypeError is raised `Unsupported dtype dtype('O') for 
integers`. It seems we could provide support for dtype('O') by constructing 
Python `int` values and this would allow arbitrarily large ranges of integers.

The core of this functionality would be close to the seven lines used in [the 
code of 
random.Random._randbelow](https://github.com/python/cpython/blob/eb953d6e4484339067837020f77eecac61f8d4f8/Lib/random.py#L242)
 which 
1) finds the number of bits needed to describe the `high` argument.
2) generates that number of random bits.
3) converts them to a python int and checks if it is larger than the input 
`high`. If so, repeat from step 2.

I realize that people can just use `random.randrange` to obtain this 
functionality, but that doesn't return an array, and uses a different RNG 
possibly requiring tracking two RNG states.  

This text was also used to create [Issue 
#24458](https://github.com/numpy/numpy/issues/24458)
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] What to do with np.matrix

2024-10-12 Thread Dan Schult
Hello everyone,
I've been helping scipy.sparse provide a sparse array interface along-side
sparse matrix.
With that effort making good progress, we are starting to think about
bigger implications.

Now may be a good time to reconsider *what to do with np.matrix*.
I think we have three options:

1) Leave np.matrix as is (with the docs discouraging its use)
2) Deprecate and then remove np.matrix
3) Split np.matrix out as a separate library `numpy-matrix` (maybe
eventually with spmatrix)

(Below, I've listed some arguments for each option from previous mailing
list conversations.)

I'm planning on writing a NEP to summarize the discussion that follows,
before proceeding
with any work. I'd appreciate your feedback on the feasibility of the above
options, or
ideas for other options to consider.

Thank you!
Dan

===
*From previous conversations (links below):*

Reasons for *1) leave as is*:

- Nothing needs to be done.
- Use for np.matrix is declining anyway.
- Old code can still use np.matrix without any changes.

Reasons for *2) remove np.matrix*:

- “The purpose here is to remove np.matrix from numpy so beginners will
never see it.“
- Option 3) has little payoff if the “teaching advantages” of np.matrix are
largely addressed by `@`.
- Slightly less maintenance time after the removal. (But more to remove it.)

Reasons for *3) numpy-matrix library*:

- Allows/Forces old code to switch once to `npm.matrix` (better than
multiple times, worse than never).
- Old code can start to switch now… and will still work for any previous
numpy version.
- Devs still have to maintain the code.
- Beginners do not see npm.matrix when starting with numpy. But still
available.


*Details and summary of history:*

*Q*: Why now? *A*: We have made progress with handling np.matrix:
   - Python 2.7 is no longer supported
   - @ is now available
   - Docs now suggest that np.matrix should not be used for new code
   - Docs are largely free of np.matrix (but not all – subclass examples
use np.matrix)
   - Scipy.sparse supports sparray (spmatrix is still there too)
   - Array API sets expectations for containers.
   - We have the history of numpy-financial as a previous split-off library.

Previous discussion ideas not yet implemented: (Many others are already
implemented)
- “The purpose here is to remove np.matrix from numpy so beginners will
never see it.“
- Easy for linear algebra, but later hard to extend to powerful
ideas of ndarrays.
- `@` should make ndarray easy enough for linear algebra.
- 3 different semantics for ndarray, matrix, spmatrix. Matrix does evil
by existing.
- Teaching concerns: `np.matrix` has powerful (readable) notation for
matrix mult,
  pow, and inverse. Most agree that `@` is all that’s needed.
  The `pow` and `inv` methods are OK.
   - Idea: Split out np.matrix as a library `numpy-matrix`-> npm.matrix
   - Make the new library look at numpy version and if old, use np.matrix
code.
  If numpy is new, use npm.matrix code. (Allows people to switch their
code
  while still supporting old numpy. Creates pointers to np.matrix for
type
  checking, etc.)
   - Remove remaining examples of np.matrix from numpy docs.
- `spmatrix` and `np.matrix` are linked. Sometimes `spmatrix` returns
`np.matrix`.

*Links:*
- 2017 mailing list thread

- 2014 mailing list thread

- The mailing list in 2008 has multiple threads related to np.matrix.
Most posts are not directly related to this question.
But if you search the archives for "2008 np.matrix" you will find lots of
discussion.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: What to do with np.matrix

2024-10-19 Thread Dan Schult
This is quite helpful. Thanks!

Github search:
I'm not surprised that many github hits are like homework problems. The big 
resistance to removing np.matrix early on (~2008) came from educators who 
wanted a Matrix oriented experience for their students who had recent linear 
algebra background. It was heavily used for at least a decade in the education 
setting. That started to wane when Python created the `@` operator. But change 
is slow. It's been 10 years with `@`.  

I very much support Ralf's concerns that this not push SciPy. It would be 
pushing me after-all. But it is hard to know how to proceed with the transfer 
from spmatrix to sparray without also knowing something about the support for 
np.matrix from the numpy devs. But... there's no reason removal of spmatrix 
needs to happen first. It may be quite natural for both to be moved to  a 
single separated package. Or they could be removed at the same time. We'll have 
to see what makes sense.

Thanks Nathan for the preview of github usage and perspective from the 2.0 
release. I'm also pleased to find that github search results for PRs can be 
sorted by date. While there are 28K PRs involving np.matrix, the recent PRs are 
almost all dependabot reminding folks to upgrade their dependencies. Of the top 
30, 3 were actions to **remove** np.matrix in favor of ndarray. 1 was 
`scipy.interpolate` (which also mentioned removing support for `np.matrix`, 
though provided a workaround instead). And the remaining 26 are dependency 
updates. That takes us back to Aug 30. Jumping to the most recent 80 PRs gave 
the same type of results, but I didn't bother counting. Almost all of them are 
dependency updates. Most of the rest are moving away from np.matrix. It is 
clear that recent activity (as measured by PRs) does not show much activity 
using np.matrix.

Perhaps most importantly, there don't seem to be any courses being run this 
semester that have students creating PRs using np.matrix.

And thanks Marten, Sebastian and Chuck for the nudge to find a way to move 
forward with the deprecation process. I think the change to 
`VisibleDeprecationWarning` is a good next step. Hopefully we don't have to 
wait another 7 years for the following step unless we decide that keeping that 
code in numpy is the best way to go.  No one seems to have argued for just 
leaving np.matrix in the package forever, but I think it is a reasonable 
approach (similar to stating that RandomState will remain forever). But given 
the decline in usage, and the negative impacts of having multiple interfaces to 
array-like objects, it is probably better to stop supporting matrix at some 
point. 

Summary:  
It seems like eventually removing np.matrix is desirable. The choice of 
removing versus separating depends somewhat on how easy that is for both devs, 
and for users. It might be worth a short exploration to see if there is a 
solution. We should time this so it doesn't negatively impact the transition 
SciPy sparse is making. They are the main users, and leaving np.matrix as it is 
costs very little.  

Action items from this discussion include:
- Exploring impact on SciPy of a change to `VisibleDreprecationWarning`, 
possibly followed by a PR to make the change.
- Investigating a light-weight, simple separation package that wouldn't affect 
user experience much. If that's hard, then we have identified the pain points. 
If that's easy then it informs the choice of a path forward for both matrix and 
spmatrix.
- Collect info about the current usage of np.matrix, and what type of usage the 
large existing codebase needs. Put that info into a NEP, along with a summary 
of the history and current discussion, and a description of our exploration 
into possible light-weight routes to separation.

I don't expect this to be soon -- maybe by next summer -- unless other people 
get involved. I'm interested in further discussion and suggestions too.

FYI Chuck: It looks like Event Horizon Telescope doesn't use np.matrix at all 
any more.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com