On Fri, Dec 25, 2015 at 2:39 AM, Damian Rouson <dam...@sourceryinstitute.org> wrote: > Does this patch change the behavior of multi-image programs the run with one > thread per image, where each image an MPI rank? I assume the answer is no.
AFAICT, no. Or well, as I mentioned previously, the sequence of random numbers is different since a different generator is used. > Just checking to be sure. > > Also, I frequently use the init_random_seed subroutine from the gfortran > online documentation > (https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fSEED.html). Will that > need any modification based on the current patch? In my usual use case (one > thread per image), I assume nothing changes there. It should continue working, although if said part of the patch is accepted, in the future it should suffice to just call random_seed with no arguments instead of init_random_seed or something like that. FWIW, F2015 contains a random_init intrinsic which offers a bit more control how the generator is seeded for co-array programs. > > Damian > > > > On Dec 23, 2015, at 2:29 PM, Janne Blomqvist <blomqvist.ja...@gmail.com> > wrote: > > Hi, > > the GFortran random number generator (RANDOM_NUMBER and RANDOM_SEED > intrinsics) has a number of issues that the attached preliminary patch > tries to address. > > - 64-bit integers are available on all targets GFortran supports, and > the vast majority of users nowadays use targets with native 64-bit > capability. Thus by using a PRNG that uses and generates 64-bit > values we can get a bit of speedup compared to the current 32-bit > generator. > > - The current implementation is a single stream generator protected by > a mutex. This means that if multiple threads are calling > RANDOM_NUMBER > > 1) It's impossible to provide repeatable streams by using the same > seed, since thread scheduling is not deterministic. > > 2) Performance is not only bound by the performance of a single > thread, but can easily be a lot slower due to lock contention. > > I have been thinking about how one could make use of a multi-threaded > PRNG given the limitations of the RANDOM_NUMBER & RANDOM_SEED API, and > I have come up with something that I think is usable. > > The attached patch replaces the current KISS generator by the late > George Marsaglia with the xorshift1024* generator, an enhanced version > of Marsaglias xorshift generator. It's a quite nice generator, e.g. it > > - passes the TestU1 suite. > > - has a quite long period, 2**1024 - 1. So even if one generates > multiple seeds randomly, it's exceedingly unlikely that the streams > will alias in any realistic timeframe. > > - furthermore, allows a "jump" function to quickly jump forwards > 2**512 bits in the stream, which is nice for creating multiple > independent streams. Thus, with a total period of 2**1024, it allows > up to 2**512 substreams each providing 2**512 bits before any > aliasing. > > - Code is relatively simple, similar to KISS. > > Now, in order to be able to use separate streams for each thread given > the Fortran standard API, the patch is implemented as follows. There > is a master_state, which is initially statically initialized. The > per-thread state is stored in a thread-local variable, and is > initially uninitialized. > > - When RANDOM_NUMBER is called, a check is made to see whether the > per-thread generator state is initialized. If not, the state is > copied from the master_state, and the jump() function is called N > times, where N equals how many streams have previously been > generated from the master_state (the njumps variable). When the > per-thread generator state is initialized, a random number is > generated by reading and updating the per-thread generator state. > > - When RANDOM_SEED(PUT=) is called, the master_state is updated with > the new seed, njumps is reset to zero, and the current thread > generator state is copied from master_state. Thus any new streams > that are subsequently created use the new seed, whereas other > existing streams will continue using their existing states. > > - When RANDOM_SEED is called without arguments, the master_seed and > current thread seed is set to random data read from the OS > /dev/urandom device. Otherwise like above. > > While the above description might appear a bit convoluted, I think the > end results for users is somewhat intuitive and it supports the common > use cases > > - For a single-threaded program, or a multi-threaded program that > takes care to call RANDOM_NUMER/RANDOM_SEED from one thread only, > the end result is just like with the current implementation. > > - For a multi-threaded program that doesn't call RANDOM_SEED, each > thread gets its own deterministic random stream with up to 2**512 > bits before any aliasing occurs. > > - In order to get random initial seeds on each invocation of the program, > just > call random_seed without arguments, either before calling > random_number from any threads, or then in each thread. > > Note that the patch is preliminary, it works so one can evaluate > performance but there are bugs (e.g. njumps is never incremented, so > all threads generate the same stream), documentation needs to be > updated, RANDOM_SEED is not tested, the fronted check for the seed > size needs to be updated, etc. > > See the attached file rbench.f90 for a test program one can use to > evaluate performance. README.md contains results for said program. In > short, serial performance is roughly on par with the current > implementation, with threads the new one is a lot faster, up to two > orders of magnitude with 4 threads and harvesting only a single value > per RANDOM_NUMBER call. > > Comments welcome. > > -- > Janne Blomqvist > <README.md><rand_threads.diff.gz><rbench.F90> > > -- Janne Blomqvist