[Bug c++/71903] New: Wrong opcode using x86 SSE _mm_cmpge_ps intrinsics

2016-07-16 Thread carlosrafael.prog at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71903

Bug ID: 71903
   Summary: Wrong opcode using x86 SSE _mm_cmpge_ps intrinsics
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: carlosrafael.prog at gmail dot com
  Target Milestone: ---

I have the following code:

float *previousM = ...;
float *fft = ...;

for (int32_t i = 0; i < 256; i += 8) {
__m128 m0 = _mm_load_ps(previousM);
__m128 m1 = _mm_load_ps(previousM + 4);
previousM += 8;

__m128 old0 = _mm_load_ps(fft);
__m128 old1 = _mm_load_ps(fft + 4);

__m128 geq0 = _mm_cmpge_ps(m0, old0);
__m128 geq1 = _mm_cmpge_ps(m1, old1);
...
}

Since the code was behaving rather strangely, I decided to generate and read
its disassembly (below is the snippet that drew my attention):

extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__)) _mm_cmpge_ps (__m128 __A, __m128 __B)
{
  return (__m128) __builtin_ia32_cmpgeps ((__v4sf)__A, (__v4sf)__B);
  9f:   0f c2 dd 02 cmpleps %xmm5,%xmm3

Please, notice that this is not a bug in the disassembler because Intel docs
state that CMPLEPS xmm1, xmm2 becomes CMPPS xmm1, xmm2, 2

Also, this is not some weird optimization or anything else, because even if the
compiler had decided to switch m0 with old0, the opposite of >= (ge) is < (lt)
and not <= (le), as the disassembly shows.

In order to make the code work properly, I manually replaced these two lines in
my code

__m128 geq0 = _mm_cmpge_ps(m0, old0);
__m128 geq1 = _mm_cmpge_ps(m1, old1);

with these two lines

__m128 geq0 = _mm_cmplt_ps(old0, m0);
__m128 geq1 = _mm_cmplt_ps(old1, m1);

After that change, the disassembly became

extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__)) _mm_cmplt_ps (__m128 __A, __m128 __B)
{
  return (__m128) __builtin_ia32_cmpltps ((__v4sf)__A, (__v4sf)__B);
  8d:   0f c2 e3 01 cmpltps %xmm3,%xmm4

Just as an extra piece of information:
- I am using the gcc bundled with Android build tools, and since there are two
executable files, I do not know for sure if the version of the gcc being used
is "4.8" or "4.9 20140827"
- I am compiling under a 64-bit Windows 10, targeting a 32-bit x86 Android app
- The gcc used (both 4.8 and 4.9) are inside the folder windows-x86_64 (which
makes me believe I am using a 64-bit version of gcc)

[Bug target/71903] Wrong opcode using x86 SSE _mm_cmpge_ps intrinsics

2016-07-17 Thread carlosrafael.prog at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71903

--- Comment #2 from Carlos Rafael  ---
(In reply to Mikael Pettersson from comment #1)
> Can you add a standalone (compilable and runnable) test case?

I beg your pardon, Mikael. It was my bad! After submitting the bug here, I
could still did not believe that there was a bug in gcc, and I kept testing all
night long.

It turned out I was linking the library and generating the disassembly against
an outdated version of the compiled code.

After fixing my mistake, I tested the code and it worked with both _mm_cmpge_ps
and _mm_cmplt_ps.

Can you delete this bug, or close it? Or how can I do it?

[Bug target/71903] Wrong opcode using x86 SSE _mm_cmpge_ps intrinsics

2016-07-18 Thread carlosrafael.prog at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71903

--- Comment #5 from Carlos Rafael  ---
(In reply to Mikael Pettersson from comment #3)
> No worries.  As the reporter you should be able to resolve it as "invalid".

Ok! Thanks!