https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93342

            Bug ID: 93342
           Summary: wrong AVX mask generation with
                    -funsafe-math-optimizations
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nathanael.schaeffer at gmail dot com
  Target Milestone: ---

When trying to produce a xor mask to negate even elements in an AVX vector, gcc
produces wrong code with -funsafe-math-optimizations.

I've tried several ways, all giving the same wrong answer: a mask negating ALL
elements instead of just the even ones.
Since the mask is generated using INTEGER arithmetic, I don't understand the
issue here.

The only correct way with avx is to define a variable with the mask already
set.
With avx2, one can use integer intrinsics, which will produce correct mask.

The code showing the bug can be seen here.
https://godbolt.org/z/q9eamc

For the record, I also copy the code below.
When compiling the following with -O -mavx2 -funsafe-math-optimizations -S, the
mask is wrong. Without -funsafe-math-optimizations it is correct.
Since the mask is generated using integer arithmetic, I don't understand the
issue here, as -funsafe-math-optimizations only affects floating point
(according to man page).
Even stranger, the same mask, but now xor-ed using integer avx2 intrinsics
gives  the correct resuts...

#include <immintrin.h>
typedef __m128d v2d;
typedef __m256d v4d;

// generates: vxorpd  ymm0, ymm0, YMMWORD PTR wrong_mask
v4d negate_even_fail(v4d v) {
    __m256i mask = _mm256_setr_epi32(0,-2147483648, 0,0, 0,-2147483648, 0,0);
    return _mm256_xor_pd(v, _mm256_castsi256_pd(mask));
}

// generates: vxorpd  ymm0, ymm0, YMMWORD PTR correct_mask
v4d negate_even_does_not_fail(v4d v) {
    __m256i mask = _mm256_setr_epi32(0,-2147483648, 0,0, 0,-2147483648, 0,0);
    return _mm256_castsi256_pd(_mm256_xor_si256(_mm256_castpd_si256(v), mask));
}

Reply via email to