https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> --- Also notice a intersting case impacted by a separate m alternatvie. typedef long v2di __attribute__((vector_size(16))); v2di foo (v2di a) { a[1] = 1113; return a; } with -O2 gcc generates foo(long __vector(2)): movhps .LC0(%rip), %xmm0 ret .LC0: .quad 1113 llvm has foo(long __vector(2)): # @foo(long __vector(2)) movl $1113, %eax # imm = 0x459 movq %rax, %xmm1 punpcklqdq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0] retq Microbenchmark show both both sequences are almost as fast, really don't know which is better.