[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Ilhan Polat
Note that this is independent from the memory waste. There are way worse
memory ops in NumPy than this so I don't think that argument applies here
even if it was.

And like I mentioned, this is a very common operation hence internals are
secondary. But it is not an unnecessary copy of the array anyways because
that is the definition of concatenation which is a new array. And it is
very laborious to do in NumPy relatively speaking. If it was really easy,
people would probably just slap a 0 in the beginning and move on.

But instead we are now entering into a keyword commitment. I'm not sure I
agree with this strategy being better. I'm not against it, clearly there is
a demand, but probably inconvenience should not be the reason for keyword
arguments elsewhere.



On Fri, Aug 18, 2023 at 9:13 AM Ronald van Elburg <
r.a.j.van.elb...@hetnet.nl> wrote:

> Ilhan Polat wrote:
>
> > I think all these point to the missing convenient functionality that
> > extends arrays. In matlab "[0 arr 10]" nicely extends the array to a new
> > one but in NumPy you need to punch quite some code and some courage to
> > remember whether it is hstack or vstack or concat or block as the correct
> > naming which decreases the "code morale".
>
> Not having a convenient workaround is not the only problem. The workaround
> is wastefull with memory and involves unnecessary copying of  an array.
> Having a keyword implemented with these concerns in mind might avoid this.
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ilhanpo...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Ronald van Elburg
I think ultimately the copy is unnecessary.

That being said introducing prepend and append functions concentrates the 
complexity of the mapping in one place. Trying to avoid the extra copy would 
probably lead to a more complex implementation of accumulate.  

How would in your view the prepend interface differ from concatenation or 
stacking?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Arbitrarily large random integers

2023-08-19 Thread Dan Schult
How can we use numpy's random `integers` function to get uniformly selected 
integers from an arbitrarily large `high` limit? This is important when dealing 
with exact probabilities in combinatorially large solution spaces. 

I propose that we add the capability for `integers` to construct arrays of type 
object_ by having it construct python int's as the objects in the returned 
array. This would allow arbitrarily large integers.
 
The Python random library's `randrange` constructs values for arbitrary upper 
limits -- and they are exact when using subclasses of `random.Random` with a 
`getrandbits` methods (which includes the default rng for most operating 
systems).

Numpy's random `integers` function rightfully raises on `integers(20**20, 
dtype=int64)` because the upper limit is above what can be held in an `int64`. 
But Python `int` objects store arbitrarily large integers. So I would expect 
`integers(20**20, dtype=object)` to create random integers on the desired 
range. Instead a TypeError is raised `Unsupported dtype dtype('O') for 
integers`. It seems we could provide support for dtype('O') by constructing 
Python `int` values and this would allow arbitrarily large ranges of integers.

The core of this functionality would be close to the seven lines used in [the 
code of 
random.Random._randbelow](https://github.com/python/cpython/blob/eb953d6e4484339067837020f77eecac61f8d4f8/Lib/random.py#L242)
 which 
1) finds the number of bits needed to describe the `high` argument.
2) generates that number of random bits.
3) converts them to a python int and checks if it is larger than the input 
`high`. If so, repeat from step 2.

I realize that people can just use `random.randrange` to obtain this 
functionality, but that doesn't return an array, and uses a different RNG 
possibly requiring tracking two RNG states.  

This text was also used to create [Issue 
#24458](https://github.com/numpy/numpy/issues/24458)
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Arbitrarily large random integers

2023-08-19 Thread Kevin Sheppard
The easiest way to do this would to to write a pure python implementation
using Python ints of a masked integer sampler.  This way you could draw
unsigned integers and then treat this as a bit pool.  You would than take
the number of bits needed for your integer, transform these to be a Python
int, and finally apply the mask.

This is how integers are generated in the legacy Random state code.

Kevin


On Sat, Aug 19, 2023, 15:43 Dan Schult  wrote:

> How can we use numpy's random `integers` function to get uniformly
> selected integers from an arbitrarily large `high` limit? This is important
> when dealing with exact probabilities in combinatorially large solution
> spaces.
>
> I propose that we add the capability for `integers` to construct arrays of
> type object_ by having it construct python int's as the objects in the
> returned array. This would allow arbitrarily large integers.
>
> The Python random library's `randrange` constructs values for arbitrary
> upper limits -- and they are exact when using subclasses of `random.Random`
> with a `getrandbits` methods (which includes the default rng for most
> operating systems).
>
> Numpy's random `integers` function rightfully raises on `integers(20**20,
> dtype=int64)` because the upper limit is above what can be held in an
> `int64`. But Python `int` objects store arbitrarily large integers. So I
> would expect `integers(20**20, dtype=object)` to create random integers on
> the desired range. Instead a TypeError is raised `Unsupported dtype
> dtype('O') for integers`. It seems we could provide support for dtype('O')
> by constructing Python `int` values and this would allow arbitrarily large
> ranges of integers.
>
> The core of this functionality would be close to the seven lines used in
> [the code of random.Random._randbelow](
> https://github.com/python/cpython/blob/eb953d6e4484339067837020f77eecac61f8d4f8/Lib/random.py#L242)
> which
> 1) finds the number of bits needed to describe the `high` argument.
> 2) generates that number of random bits.
> 3) converts them to a python int and checks if it is larger than the input
> `high`. If so, repeat from step 2.
>
> I realize that people can just use `random.randrange` to obtain this
> functionality, but that doesn't return an array, and uses a different RNG
> possibly requiring tracking two RNG states.
>
> This text was also used to create [Issue #24458](
> https://github.com/numpy/numpy/issues/24458)
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Arbitrarily large random integers

2023-08-19 Thread Robert Kern
On Sat, Aug 19, 2023 at 10:49 AM Kevin Sheppard 
wrote:

> The easiest way to do this would to to write a pure python implementation
> using Python ints of a masked integer sampler.  This way you could draw
> unsigned integers and then treat this as a bit pool.  You would than take
> the number of bits needed for your integer, transform these to be a Python
> int, and finally apply the mask.
>

Indeed, that's how `random.Random` does it. I've commented on the issue
with an implementation that subclasses `random.Random` to use numpy PRNGs
as the source of bits for maximum compatibility with `Random`. The given
use case motivating this feature request is networkx, which manually wraps
numpy PRNGs in a class that incompletely mimics the `Random` interface. A
true subclass eliminates all of the remaining inconsistencies between the
two. I'm inclined to leave it at that and not extend the `Generator`
interface.

https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-19 Thread Dom Grigonis
Unfortunately, I don’t have a good answer.

For now, I can only tell you what I think might benefit from improvement.

1. Verbosity. I appreciate that bracket syntax such as one in julia or matlab 
`[A B C ...]` is not possible, so functional is the only option. E.g. julia has 
functions named ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’. I myself have recently 
redefined np.concatenate to `np_c`. For simple operations, it would surely be 
nice to have methods. E.g. `arr.append(axis)/arr.prepend(axis)`.

2. Excessive number of functions. There seems to be very many functions for 
concatenating and stacking. Many operations can be done using different 
functions and approaches and usually one of them is several times faster than 
the rest. I will give an example. Stacking two 1d vectors as columns of 2d 
array:

arr = np.arange(100)
TIMER.repeat([
lambda: np.array([arr, arr]).T,
lambda: np.vstack([arr, arr]).T,
lambda: np.stack([arr, arr]).T,
lambda: np.c_[arr, arr],
lambda: np.column_stack((arr, arr)),
lambda: np.concatenate([arr[:, None], arr[:, None]], axis=1)
]).print(3)
# mean [[0.012 0.044 0.052 0.13  0.032 0.024]]
Instead, having fewer, but more intuitive/flexible and well optimised functions 
would be a bit more convenient.

3. Flattening and reshaping API is not very intuitive. e.g. torch flatten is an 
example of a function which has a desired level of flexibility in contrast to 
`np.flatten`. https://pytorch.org/docs/stable/generated/torch.flatten.html 
. I had similar 
issues with multidimensional searching, sorting, multi-dimensional overlaps and 
custom unique functions. In other words, all functionality is there already, 
but in more custom (although requirement is often very simple from perspective 
of how it looks in my mind) multi-dimensional cases, there is no easy API and I 
end up writing my own numpy functions and benchmarking numerous ways to achieve 
the same thing. By now, I have my own multi-dimensional unique, sort, search, 
flatten, more flexible ix_, which are not well tested, but already more 
convenient, flexible and often several times faster than numpy ones (although 
all they do is reuse existing numpy functionality).

I think these are more along the lines of numpy 2.0, rather than simple 
extension. It feels that API can generally be more flexible and intuitive and 
there is enough of existing numpy material and external examples from which to 
draw from to make next level API happen. Although I appreciate required effort 
and difficulties.

Having all that said, implementing julia’s equivalents ‘cat’, ‘vcat’, ‘hcat’, 
‘vhcat’ together with `arr.append(others, axis), arr.prepend(others, axis)` 
while ensuring that they use most optimised approaches could potentially make 
life easier for the time being.


—Nothing ever dies, just enters the state of deferred evaluation—
Dg

> On 19 Aug 2023, at 17:39, Ronald van Elburg  
> wrote:
> 
> I think ultimately the copy is unnecessary.
> 
> That being said introducing prepend and append functions concentrates the 
> complexity of the mapping in one place. Trying to avoid the extra copy would 
> probably lead to a more complex implementation of accumulate.  
> 
> How would in your view the prepend interface differ from concatenation or 
> stacking?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com