https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Also notice a intersting case impacted by a separate m alternatvie.
typedef long v2di __attribute__((vector_size(16)));
v2di
foo (v2di a)
{
a[1] = 1113;
return a;
}
with -O2 gcc generates
foo(long __vector(2)):
movhps .LC0(%rip), %xmm0
ret
.LC0:
.quad 1113
llvm has
foo(long __vector(2)): # @foo(long __vector(2))
movl $1113, %eax # imm = 0x459
movq %rax, %xmm1
punpcklqdq %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0]
retq
Microbenchmark show both both sequences are almost as fast, really don't know
which is better.