Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/home/jeff/gnu/TR
--program-suffix=TR --enable-languages=c,c++
Thread model: posix
gcc version 4.6.0 20100608 (experimental) (GCC)
While running some tests against SSE4.2 instructions, I noticed that the
__builtin_ia32_pcmpestri128 method generates the correct pcmpestri call
followed immediately by an extraneous pcmpestrm call. The second call goes
away when compiled with any optimization level.
A very simple test program requiring no pre-processing:
BEGIN SAMPLE: sseTest2.c
typedef long long __m128i __attribute__ ((__vector_size__ (16),
__may_alias__));
typedef char __v16qi __attribute__ ((__vector_size__ (16)));
int
main()
{
__v16qi c = (__v16qi){ 'K' };
__v16qi str1 =
(__v16qi){'A','B','C','D','E','F','G','H','I','J','K','L','M'};
int v = __builtin_ia32_pcmpestri128(c, 1, str1, 13, 0);
return v;
}
END SAMPLE
Building with:
~/gnu/TR/bin/gccTR -S -msse4.2 sseTest2.c -o sseTest2.nonoptimized.s
shows the extra opcode:
movdqa .LC0(%rip), %xmm0
movdqa %xmm0, -32(%rbp)
movdqa .LC1(%rip), %xmm0
movdqa %xmm0, -48(%rbp)
movdqa -48(%rbp), %xmm1
movdqa -32(%rbp), %xmm0
movl$1, %eax
movl$13, %edx
pcmpestri $0, %xmm1, %xmm0
pcmpestrm $0, %xmm1, %xmm0
movl%ecx, -4(%rbp)
movl-4(%rbp), %eax
leave
Building with:
~/gnu/TR/bin/gccTR -S -O -msse4.2 sseTest2.c -o sseTest2.optimized.s
shows no extra opcode:
movdqa .LC0(%rip), %xmm0
movl$1, %eax
movl$13, %edx
pcmpestri $0, .LC1(%rip), %xmm0
movl%ecx, %eax
ret
--
Summary: __builtin_ia32_pcmpestri128 generates an additional
pcmpestrm operation
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jeff_wegher at yahoo dot com
GCC build triplet: 4.6.0
GCC host triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44472