On Wed, 26 Sep 2012, Ulrich Drepper wrote:
On Wed, Sep 26, 2012 at 7:32 AM, Jakub Jelinek <ja...@redhat.com> wrote:
Have you considered also an __AVX__ version handling 4 elements at a time?
Without __AVX2__ one would need to cast __m256i to __m256d for and/or, as
AVX1 doesn't have _mm256_and_si256 or _mm256_or_si256, but _mm256_and_pd
or _mm256_or_pd could be used instead.
Or we can just use '|' on __m256i vectors, which generates a suitable
instruction depending on -mavx/-mavx2.
One step to do first.
:-)
Currently the random number engine interface is
inefficient since it returns a single number. What we need is an
additional interface to return vectors.
Isn't the __generate interface good enough?
--
Marc Glisse
PS: mixed scalar-vector operations are posted here:
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01557.html
after that, C and C++ should be at the same level.