Hi Robert,

Thank you for the pointers.

I think numpy.random should have a mechanism to choose between methods for 
generating the underlying randomness dynamically, at a run-time, as well as an 
extensible framework, where developers could add more methods. The default 
would be MT19937 for backwards compatibility. It is important to be able to do 
this at a run-time, as it would allow one to use different algorithms in 
different threads (like different members of the parallel Mersenne twister 
family of generators, see MT2203).

The framework should allow to define randomness as a bit stream, a stream of 
fixed size integers, or a stream of uniform reals (32 or 64 bits). This is a 
lot of like MKL’s abstract method for basic pseudo-random number generation.

https://software.intel.com/en-us/node/590373

Each method should provide routines to sample from uniform distributions over 
reals (in floats and doubles), as well as over integers.

All remaining non-uniform distributions build on top of these uniform streams.

I think it is pretty important to refactor numpy.random to allow the underlying 
generators to produce a given number of independent variates at a time. There 
could be convenience wrapper functions to allow to get one variate for 
backwards compatibility, but this change in design would allow for better 
efficiency, as sampling a vector of random variates at once is often faster 
than repeated sampling of one at a time due to set-up cost, vectorization, etc.

Finally, methods to sample particular distribution should uniformly support 
method keyword argument. Because method names vary from distribution to 
distribution, it should ideally be programmatically discoverable which methods 
are supported for a given distribution. For instance, the standard normal 
distribution could support method=’Inversion’, method=’Box-Muller’, 
method=’Ziggurat’, method=’Box-Muller-Marsaglia’ (the one used in numpy.random 
right now), as well as bunch of non-named methods based on transformed 
rejection method (see http://statistik.wu-wien.ac.at/anuran/ )

It would also be good if one could dynamically register a new method to sample 
from a non-uniform distribution. This would allow, for instance, to 
automatically add methods to sample certain non-uniform distribution by 
directly calling into MKL (or other library), when available, instead of 
building them from uniforms (which may remain a fall-through method).

The linked project is a good start, but the choice of the underlying algorithm 
needs to be made at a run-time,
as far as I understood, and the only provided interface to query random 
variates is one at a time, just like it is currently the case
in numpy.random.

Oleksandr

From: NumPy-Discussion [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of 
Robert Kern
Sent: Friday, June 17, 2016 10:23 AM
To: Discussion of Numerical Python <numpy-discussion@scipy.org>
Subject: Re: [Numpy-discussion] Design feedback solicitation

On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr 
<oleksandr.pav...@intel.com<mailto:oleksandr.pav...@intel.com>> wrote:
>
> Hi,
>
> I am new to this list, so I will start with an introduction. My name is 
> Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for 
> Python, and previously worked at Wolfram Research for 12 years. My latest 
> project was to write a mirror to numpy.random, named numpy.random_intel. The 
> module uses MKL to sample from different distributions for efficiency. It 
> provides support for different underlying algorithms for basic pseudo-random 
> number generation, i.e. in addition to MT19937, it also provides SFMT19937, 
> MT2203, etc.
>
> I recently published a blog about it:
>
>        
> https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python
>
> I originally attempted to simply replace numpy.random in the Intel 
> Distribution for Python with the new module, but due to fixed seed backwards 
> incompatibility this results in numerous test failures in numpy, scipy, 
> pandas and other modules.
>
> Unlike numpy.random, the new module generates a vector of random numbers at a 
> time, which can be done faster than repeatedly generating the same number of 
> variates one at a time.
>
> The source code for the new module is not upstreamed yet, and this email is 
> meant to solicit early community feedback to allow for faster acceptance of 
> the proposed changes.

Cool! You can find pertinent discussion here:

  https://github.com/numpy/numpy/issues/6967

And the current effort for adding new core PRNGs here:

  https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to