https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71903
Bug ID: 71903
Summary: Wrong opcode using x86 SSE _mm_cmpge_ps intrinsics
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: carlosrafael.prog at gmail dot com
Target Milestone: ---
I have the following code:
float *previousM = ...;
float *fft = ...;
for (int32_t i = 0; i < 256; i += 8) {
__m128 m0 = _mm_load_ps(previousM);
__m128 m1 = _mm_load_ps(previousM + 4);
previousM += 8;
__m128 old0 = _mm_load_ps(fft);
__m128 old1 = _mm_load_ps(fft + 4);
__m128 geq0 = _mm_cmpge_ps(m0, old0);
__m128 geq1 = _mm_cmpge_ps(m1, old1);
...
}
Since the code was behaving rather strangely, I decided to generate and read
its disassembly (below is the snippet that drew my attention):
extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__)) _mm_cmpge_ps (__m128 __A, __m128 __B)
{
return (__m128) __builtin_ia32_cmpgeps ((__v4sf)__A, (__v4sf)__B);
9f: 0f c2 dd 02 cmpleps %xmm5,%xmm3
Please, notice that this is not a bug in the disassembler because Intel docs
state that CMPLEPS xmm1, xmm2 becomes CMPPS xmm1, xmm2, 2
Also, this is not some weird optimization or anything else, because even if the
compiler had decided to switch m0 with old0, the opposite of >= (ge) is < (lt)
and not <= (le), as the disassembly shows.
In order to make the code work properly, I manually replaced these two lines in
my code
__m128 geq0 = _mm_cmpge_ps(m0, old0);
__m128 geq1 = _mm_cmpge_ps(m1, old1);
with these two lines
__m128 geq0 = _mm_cmplt_ps(old0, m0);
__m128 geq1 = _mm_cmplt_ps(old1, m1);
After that change, the disassembly became
extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__)) _mm_cmplt_ps (__m128 __A, __m128 __B)
{
return (__m128) __builtin_ia32_cmpltps ((__v4sf)__A, (__v4sf)__B);
8d: 0f c2 e3 01 cmpltps %xmm3,%xmm4
Just as an extra piece of information:
- I am using the gcc bundled with Android build tools, and since there are two
executable files, I do not know for sure if the version of the gcc being used
is "4.8" or "4.9 20140827"
- I am compiling under a 64-bit Windows 10, targeting a 32-bit x86 Android app
- The gcc used (both 4.8 and 4.9) are inside the folder windows-x86_64 (which
makes me believe I am using a 64-bit version of gcc)