Here is a simple example which shows the bug when compiled with -arch i386 -O2 -msse3 -funroll-loops -ftree-vectorize -msse3-ftree-vectorize and -funroll-loops
#include <cstdio> #include <stdint.h> int main (int, char * const) { const int count = 5; uint32_t x[count]; for (int i = 0; i < count; ++i) x[i] = 1; for (int i = 0; i < count; ++i) x[i] = x[i] << 24; for (int i = 0; i < count; ++i) std::printf("%x ", x[i]); return 0; } The compiler is vectorizing the shifts in the loop, but it's generating the wrong shift constant. It's putting the shift value (24) into each element of the __v4si vector instead of setting the __m128 value to 24. These means that each element is shifted by 8208 instead of 24 and the result is 0. <+0029> movdqa 8208,%xmm0 <+0037> movdqa (%eax),%xmm1 <+0041> pslld %xmm0,%xmm1 -- Summary: sse autovectorizer emits wrong code involving shifts Product: gcc Version: 4.0.1 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: elronayellin at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28007