http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55461
Bug #: 55461 Summary: _mm_loadu_si128 generates wrong instruction on 4.8 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org Following test case compiled with -Os -msse4.2 generates with 4.8: .L3: movups (%rdi), %xmm0 <--------------- wrong data type, should be movupd pcmpistrm $24, (%rdi,%rsi), %xmm0 jna .L2 addq $16, %rdi jmp .L3 4.7 gives the correct (although it makes a mess out of the loop): L3: movdqu (%rdi,%rsi), %xmm1 movdqu (%rdi), %xmm2 pcmpistrm $24, %xmm1, %xmm2 jna .L2 addq $16, %rdi jmp .L3 A simpler test case gives the correct movupd, so it must be related to the funky pointer arithmetic the test case does. But in any case it should not turn an integer vector into a float vector #include <nmmintrin.h> int c_strcmp(char *a, char *b) { unsigned long diff = (unsigned long)b - (unsigned long)a; int r = 16; a -= r; for (;;) { if (_mm_cmpistra(_mm_loadu_si128((__m128i *)a), _mm_loadu_si128((__m128i *)((unsigned long)a + diff)), 0x18)) { a += r; continue; } /* check C here */ return 0; } return 0; }