http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48678
--- Comment #12 from Uros Bizjak <ubizjak at gmail dot com> 2011-04-20 14:27:30 UTC --- Hm, if line 14 in the testcase is changed to: - ((T *) &s.d)[0] = *x; + ((T *) &s.d)[1] = *x; then gcc does not touch movstrict pattern at all and generates following code: movdqa (%rsi), %xmm0 movabsq $-4294901761, %rsi movzwl (%rdi), %eax movdqa %xmm0, -24(%rsp) movq -24(%rsp), %rcx salq $16, %rax andq %rsi, %rcx orq %rax, %rcx movq %rcx, -24(%rsp) movdqa -24(%rsp), %xmm0 pcmpeqw (%rdx), %xmm0 ret However, when byte offset reaches sizeof (*void), i.e. 8 bytes on 64bit target as in: - ((T *) &s.d)[0] = *x; + ((T *) &s.d)[4] = *x; then we again get: movdqa (%rsi), %xmm0 pinsrw $4, (%rdi), %xmm0 pcmpeqw (%rdx), %xmm0 ret I didn't investigate this in detail, but perhaps someone can shed some light here?