http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46209
Summary: pmovmskb, useless sign extension Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: tbp...@gmail.com Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu Hello, $ cat movemsk.c #include <xmmintrin.h> typedef long unsigned int uint64_t; uint64_t foo128(__m128i x) { return _mm_movemask_epi8(x); } uint64_t foo64(__m64 x) { return _mm_movemask_pi8(x); } $ /usr/local/gcc-4.6-20101026/bin/gcc -O3 -march=native movemsk.c -S -o - foo128: .LFB516: .cfi_startproc pmovmskb %xmm0, %eax cltq ret foo64: .LFB517: .cfi_startproc movdq2q %xmm0, %mm0 movq %xmm0, -8(%rsp) pmovmskb %mm0, %eax cltq ret I won't discuss the interesting mmx code generation but to point that in both cases, as per Intel doc, there's no need to extend the result; a sign extension is even slightly more wrong. $ /usr/local/gcc-4.6-20101026/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc-4.6-20101026/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-4.6-20101026/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr/local/gcc-4.6.0 --enable-languages=c,c++ --enable-threads=posix --disable-nls --with-system-zlib --disable-bootstrap --enable-mpfr --enable-gold --enable-lto --with-ppl --with-cloog --with-arch=native --enable-checking=release Thread model: posix gcc version 4.6.0 20101026 (experimental) (GCC)