Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Ulrich Drepper
On Wed, Sep 26, 2012 at 12:14 PM, Marc Glisse wrote: >> Currently the random number engine interface is >> inefficient since it returns a single number. What we need is an >> additional interface to return vectors. > > > Isn't the __generate interface good enough? __generate is for the distribut

Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Marc Glisse
On Wed, 26 Sep 2012, Ulrich Drepper wrote: On Wed, Sep 26, 2012 at 7:32 AM, Jakub Jelinek wrote: Have you considered also an __AVX__ version handling 4 elements at a time? Without __AVX2__ one would need to cast __m256i to __m256d for and/or, as AVX1 doesn't have _mm256_and_si256 or _mm256_or_

Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Ulrich Drepper
On Wed, Sep 26, 2012 at 7:32 AM, Jakub Jelinek wrote: > Have you considered also an __AVX__ version handling 4 elements at a time? > Without __AVX2__ one would need to cast __m256i to __m256d for and/or, as > AVX1 doesn't have _mm256_and_si256 or _mm256_or_si256, but _mm256_and_pd > or _mm256_or_p

Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Jakub Jelinek
On Wed, Sep 26, 2012 at 07:16:09AM -0400, Ulrich Drepper wrote: > Here is a patch to accelerate the __generate function for the > normal_distribution class. The speed-up is quite significant, > the amount depending on which random number engine is used. > > mt19937+20% > > mt19937_64

Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Paolo Carlini
Hi, On 09/26/2012 01:16 PM, Ulrich Drepper wrote: Here is a patch to accelerate the __generate function for the normal_distribution class. The speed-up is quite significant, the amount depending on which random number engine is used. mt19937+20% mt19937_64 +30% sfmt19937 +30

[PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Ulrich Drepper
Here is a patch to accelerate the __generate function for the normal_distribution class. The speed-up is quite significant, the amount depending on which random number engine is used. mt19937+20% mt19937_64 +30% sfmt19937 +30% sfmt19937_64 +30% This patch introduces a head