https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
Bug ID: 99563
Summary: Code miscompilation caused by _mm256_zeroupper()
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: andysem at mail dot ru
Target Milestone: ---
Consider the following code:
#include <immintrin.h>
constexpr unsigned int block_size = 8u;
float compute_generic(const double* data, unsigned int width, unsigned int
height);
inline __attribute__((always_inline))
float compute_avx(const double* data, unsigned int width, unsigned int height)
{
__m128d mm_res = _mm_setzero_pd();
unsigned long block_count = static_cast< unsigned long >((width +
block_size - 1) / block_size)
* static_cast< unsigned long >((height + block_size - 1) / block_size);
float res = static_cast< float >(_mm_cvtsd_f64(mm_res) / static_cast<
double >(block_count));
_mm256_zeroupper();
return res;
}
float compute(const double* data, unsigned int width, unsigned int height)
{
if (width >= 16 && height >= block_size)
{
return compute_avx(data, width, height);
}
else
{
return compute_generic(data, width, height);
}
}
$ g++ -O2 -march=sandybridge -mno-vzeroupper -o test.o test.cpp
https://gcc.godbolt.org/z/dhr7an
The code compiles to:
compute(double const*, unsigned int, unsigned int):
cmp esi, 15
jbe .L2
cmp edx, 7
jbe .L2
vzeroupper
ret
.L2:
jmp compute_generic(double const*, unsigned int, unsigned int)
which leaves the result of compute() uninitialized if AVX path is taken. The
problem disappears if one of the following is done:
- -O2 is replaced with -O1
- -mno-vzeroupper is removed
- _mm256_zeroupper(); call is removed (the upper bits of vector registers is
left dirty, though)
This is a regression in gcc 10 branch and later, gcc 9.x compiles this
correctly.