Re: [PATCH, RFC, gfortran] Multi-threaded random number generator

Janne Blomqvist Fri, 25 Dec 2015 01:35:39 -0800

On Fri, Dec 25, 2015 at 2:39 AM, Damian Rouson
<dam...@sourceryinstitute.org> wrote:
> Does this patch change the behavior of multi-image programs the run with one
> thread per image, where each image an MPI rank?   I assume the answer is no.


AFAICT, no. Or well, as I mentioned previously, the sequence of random
numbers is different since a different generator is used.

> Just checking to be sure.
>
> Also, I frequently use the init_random_seed subroutine from the gfortran
> online documentation
> (https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fSEED.html).  Will that
> need any modification based on the current patch?  In my usual use case (one
> thread per image), I assume nothing changes there.

It should continue working, although if said part of the patch is
accepted, in the future it should suffice to just call random_seed
with no arguments instead of init_random_seed or something like that.

FWIW, F2015 contains a random_init intrinsic which offers a bit more
control how the generator is seeded for co-array programs.

>
> Damian
>
>
>
> On Dec 23, 2015, at 2:29 PM, Janne Blomqvist <blomqvist.ja...@gmail.com>
> wrote:
>
> Hi,
>
> the GFortran random number generator (RANDOM_NUMBER and RANDOM_SEED
> intrinsics) has a number of issues that the attached preliminary patch
> tries to address.
>
> - 64-bit integers are available on all targets GFortran supports, and
> the vast majority of users nowadays use targets with native 64-bit
> capability.  Thus by using a PRNG that uses and generates 64-bit
> values we can get a bit of speedup compared to the current 32-bit
> generator.
>
> - The current implementation is a single stream generator protected by
> a mutex. This means that if multiple threads are calling
> RANDOM_NUMBER
>
> 1) It's impossible to provide repeatable streams by using the same
> seed, since thread scheduling is not deterministic.
>
> 2) Performance is not only bound by the performance of a single
> thread, but can easily be a lot slower due to lock contention.
>
> I have been thinking about how one could make use of a multi-threaded
> PRNG given the limitations of the RANDOM_NUMBER & RANDOM_SEED API, and
> I have come up with something that I think is usable.
>
> The attached patch replaces the current KISS generator by the late
> George Marsaglia with the xorshift1024* generator, an enhanced version
> of Marsaglias xorshift generator. It's a quite nice generator, e.g. it
>
> - passes the TestU1 suite.
>
> - has a quite long period, 2**1024 - 1. So even if one generates
> multiple seeds randomly, it's exceedingly unlikely that the streams
> will alias in any realistic timeframe.
>
> - furthermore, allows a "jump" function to quickly jump forwards
> 2**512 bits in the stream, which is nice for creating multiple
> independent streams. Thus, with a total period of 2**1024, it allows
> up to 2**512 substreams each providing 2**512 bits before any
> aliasing.
>
> - Code is relatively simple, similar to KISS.
>
> Now, in order to be able to use separate streams for each thread given
> the Fortran standard API, the patch is implemented as follows. There
> is a master_state, which is initially statically initialized. The
> per-thread state is stored in a thread-local variable, and is
> initially uninitialized.
>
> - When RANDOM_NUMBER is called, a check is made to see whether the
> per-thread generator state is initialized. If not, the state is
> copied from the master_state, and the jump() function is called N
> times, where N equals how many streams have previously been
> generated from the master_state (the njumps variable). When the
> per-thread generator state is initialized, a random number is
> generated by reading and updating the per-thread generator state.
>
> - When RANDOM_SEED(PUT=) is called, the master_state is updated with
> the new seed, njumps is reset to zero, and the current thread
> generator state is copied from master_state. Thus any new streams
> that are subsequently created use the new seed, whereas other
> existing streams will continue using their existing states.
>
> - When RANDOM_SEED is called without arguments, the master_seed and
> current thread seed is set to random data read from the OS
> /dev/urandom device. Otherwise like above.
>
> While the above description might appear a bit convoluted, I think the
> end results for users is somewhat intuitive and it supports the common
> use cases
>
> - For a single-threaded program, or a multi-threaded program that
> takes care to call RANDOM_NUMER/RANDOM_SEED from one thread only,
> the end result is just like with the current implementation.
>
> - For a multi-threaded program that doesn't call RANDOM_SEED, each
> thread gets its own deterministic random stream with up to 2**512
> bits before any aliasing occurs.
>
> - In order to get random initial seeds on each invocation of the program,
> just
> call random_seed without arguments, either before calling
> random_number from any threads, or then in each thread.
>
> Note that the patch is preliminary, it works so one can evaluate
> performance but there are bugs (e.g. njumps is never incremented, so
> all threads generate the same stream), documentation needs to be
> updated, RANDOM_SEED is not tested, the fronted check for the seed
> size needs to be updated, etc.
>
> See the attached file rbench.f90 for a test program one can use to
> evaluate performance. README.md contains results for said program. In
> short, serial performance is roughly on par with the current
> implementation, with threads the new one is a lot faster, up to two
> orders of magnitude with 4 threads and harvesting only a single value
> per RANDOM_NUMBER call.
>
> Comments welcome.
>
> --
> Janne Blomqvist
> <README.md><rand_threads.diff.gz><rbench.F90>
>
>



-- 
Janne Blomqvist

Re: [PATCH, RFC, gfortran] Multi-threaded random number generator

Reply via email to