The following code is miscompiled at -O3 with gcc 4.3.0 ; g[1] is filled with movq+movlps instead of movq+movhps.
void frob(long long *t, const long long *s1, const long long *s2) { long long w_s2[2]; long long z; z = s2[0]; w_s2[0] = z & 0x1fUL; z >>= 5; w_s2[1] = z; typedef union { __v2di s; long long x[2]; } __v2di_proxy; __v2di g[4]; g[0] = (__v2di) { 0,}; g[1] = (__v2di) { w_s2[0], w_s2[1],}; // it's unused in my testcase, and makes the assembly diff more // readable. // g[2] = (__v2di) { 0,}; g[3] = g[1]; __v2di_proxy r; r.s = g[s1[0]]; t[0] = r.x[0]; t[1] = r.x[1]; } Here is the diff of the generated asm. A full testcase follows in the form of a tar file. frob: movq (%rdx), %rax pxor %xmm0, %xmm0 movdqa %xmm0, -72(%rsp) movq %rax, %rdx andl $31, %edx movq %rdx, -96(%rsp) sarq $5, %rax - movq %rax, -104(%rsp) - movq -96(%rsp), %xmm1 - movhps -104(%rsp), %xmm1 + movq %rax, -112(%rsp) + movq -112(%rsp), %xmm1 + movlps -96(%rsp), %xmm1 Notice how the shifted rax stored in -112(%rsp) goes to xmm1 with movq while it should reach the high word. Have I done anything wrong ? E. -- Summary: wrong code for loading an sse2 register Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Emmanuel dot Thome at inria dot fr GCC build triplet: x86_64-redhat-linux GCC host triplet: x86_64-redhat-linux GCC target triplet: x86_64-redhat-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37340