On Wed, Sep 26, 2012 at 07:16:09AM -0400, Ulrich Drepper wrote: > Here is a patch to accelerate the __generate function for the > normal_distribution<double> class. The speed-up is quite significant, > the amount depending on which random number engine is used. > > mt19937 +20% > > mt19937_64 +30% > > sfmt19937 +30% > > sfmt19937_64 +30% > > > This patch introduces a header with optimizations for <random>. No > changes to existing code needed, this is a straight-forward > specialization. Tested on x86_64-linux. More optimizations follow, > there is still quite a bit of inefficiency in the existing interfaces. > OK to commit?
Have you considered also an __AVX__ version handling 4 elements at a time? Without __AVX2__ one would need to cast __m256i to __m256d for and/or, as AVX1 doesn't have _mm256_and_si256 or _mm256_or_si256, but _mm256_and_pd or _mm256_or_pd could be used instead. Jakub