Hi, A way to fill an xmm register with all one, is to use _mm_cmpeq_epi{8,16,32} wiht itself.
However, if you write: __m128i r; r = _mm_cmpeq_epi32 (r, r); gcc absolutely wants to clear the register before and generates (this is output of objdump -d, compiled with -O3 -march=core2): 401484: 66 0f ef c0 pxor %xmm0,%xmm0 401488: 66 0f 74 c0 pcmpeqw %xmm0,%xmm0 It does not discover that the result is independant of the initial value of r, and wants to clear it before. Similarly, if one writes (code adapted from _mm_setzero_si128 (void) in emmintrin.h): __m128i r = __extension__ (__m128i)(__v4si){ 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }; then this will generate a memory load operation, instead of the optimized pcmpeqw instruction. I would expect both __m128i r; r = _mm_cmpeq_epi32 (r, r); and __m128i r = __extension__ (__m128i)(__v4si){ 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }; generate the same unique instruction: pcmpeqw %xmm0, %xmm0 exactly as: __m128i r; r = _mm_xor_si128 (r, r); and __m128i r = __extension__ (__m128i)(__v4si){ 0, 0, 0, 0 }; outputs pxor %xmm0, %xmm0 in both cases. Best regards. Antoine -- Summary: Filling xmm register with all bit set is not optimized Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: etjq78kl at free dot fr GCC target triplet: i?86 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084