[Bug c/44472] New: __builtin_ia32_pcmpestri128 generates an additional pcmpestrm operation

2010-06-08 Thread jeff_wegher at yahoo dot com
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/home/jeff/gnu/TR
--program-suffix=TR --enable-languages=c,c++
Thread model: posix
gcc version 4.6.0 20100608 (experimental) (GCC) 

While running some tests against SSE4.2 instructions, I noticed that the
__builtin_ia32_pcmpestri128 method generates the correct pcmpestri call
followed immediately by an extraneous pcmpestrm call.  The second call goes
away when compiled with any optimization level.

A very simple test program requiring no pre-processing:

BEGIN SAMPLE: sseTest2.c
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));
typedef char __v16qi __attribute__ ((__vector_size__ (16)));

int
main()
{
__v16qi c = (__v16qi){ 'K' };
__v16qi str1 =
(__v16qi){'A','B','C','D','E','F','G','H','I','J','K','L','M'};

int v = __builtin_ia32_pcmpestri128(c, 1, str1, 13, 0);

return v;
}
END SAMPLE

Building with:
~/gnu/TR/bin/gccTR -S -msse4.2 sseTest2.c -o sseTest2.nonoptimized.s

shows the extra opcode:
movdqa  .LC0(%rip), %xmm0
movdqa  %xmm0, -32(%rbp)
movdqa  .LC1(%rip), %xmm0
movdqa  %xmm0, -48(%rbp)
movdqa  -48(%rbp), %xmm1
movdqa  -32(%rbp), %xmm0
movl$1, %eax
movl$13, %edx
pcmpestri   $0, %xmm1, %xmm0
pcmpestrm   $0, %xmm1, %xmm0
movl%ecx, -4(%rbp)
movl-4(%rbp), %eax
leave


Building with:
~/gnu/TR/bin/gccTR -S -O -msse4.2 sseTest2.c -o sseTest2.optimized.s

shows no extra opcode:
movdqa  .LC0(%rip), %xmm0
movl$1, %eax
movl$13, %edx
pcmpestri   $0, .LC1(%rip), %xmm0
movl%ecx, %eax
ret


-- 
   Summary: __builtin_ia32_pcmpestri128 generates an additional
pcmpestrm operation
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
          Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jeff_wegher at yahoo dot com
 GCC build triplet: 4.6.0
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44472



[Bug c/44472] __builtin_ia32_pcmpestri128 generates an additional pcmpestrm operation

2010-06-08 Thread jeff_wegher at yahoo dot com


--- Comment #1 from jeff_wegher at yahoo dot com  2010-06-09 02:38 ---
Created an attachment (id=20870)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20870&action=view)
Example program


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44472



[Bug c/44472] __builtin_ia32_pcmpestri128 generates an additional pcmpestrm operation

2010-06-08 Thread jeff_wegher at yahoo dot com


--- Comment #3 from jeff_wegher at yahoo dot com  2010-06-09 05:03 ---
Sure, non-optimized code generates a lot of instructions that aren't strictly
needed.  This one seemed a little beyond that to me.

Just so I understand then...  By design, __builtin*_pcmpestri and
__builtin*_pcmpestrm both generate the same pair of opcodes to calculate both
the index and the mask and then allow the optimizer to remove the value that
isn't used?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44472