https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93342
Bug ID: 93342
Summary: wrong AVX mask generation with
-funsafe-math-optimizations
Product: gcc
Version: 9.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
When trying to produce a xor mask to negate even elements in an AVX vector, gcc
produces wrong code with -funsafe-math-optimizations.
I've tried several ways, all giving the same wrong answer: a mask negating ALL
elements instead of just the even ones.
Since the mask is generated using INTEGER arithmetic, I don't understand the
issue here.
The only correct way with avx is to define a variable with the mask already
set.
With avx2, one can use integer intrinsics, which will produce correct mask.
The code showing the bug can be seen here.
https://godbolt.org/z/q9eamc
For the record, I also copy the code below.
When compiling the following with -O -mavx2 -funsafe-math-optimizations -S, the
mask is wrong. Without -funsafe-math-optimizations it is correct.
Since the mask is generated using integer arithmetic, I don't understand the
issue here, as -funsafe-math-optimizations only affects floating point
(according to man page).
Even stranger, the same mask, but now xor-ed using integer avx2 intrinsics
gives the correct resuts...
#include <immintrin.h>
typedef __m128d v2d;
typedef __m256d v4d;
// generates: vxorpd ymm0, ymm0, YMMWORD PTR wrong_mask
v4d negate_even_fail(v4d v) {
__m256i mask = _mm256_setr_epi32(0,-2147483648, 0,0, 0,-2147483648, 0,0);
return _mm256_xor_pd(v, _mm256_castsi256_pd(mask));
}
// generates: vxorpd ymm0, ymm0, YMMWORD PTR correct_mask
v4d negate_even_does_not_fail(v4d v) {
__m256i mask = _mm256_setr_epi32(0,-2147483648, 0,0, 0,-2147483648, 0,0);
return _mm256_castsi256_pd(_mm256_xor_si256(_mm256_castpd_si256(v), mask));
}