https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Also notice a intersting case impacted by a separate m alternatvie.

typedef long v2di __attribute__((vector_size(16)));

v2di
foo (v2di a)
{
  a[1] = 1113;
  return a;
}

with -O2 gcc generates

foo(long __vector(2)):
        movhps  .LC0(%rip), %xmm0
        ret
.LC0:
        .quad   1113

llvm has

foo(long __vector(2)):                            # @foo(long __vector(2))
        movl    $1113, %eax                     # imm = 0x459
        movq    %rax, %xmm1
        punpcklqdq      %xmm1, %xmm0            # xmm0 = xmm0[0],xmm1[0]
        retq

Microbenchmark show both both sequences are almost as fast, really don't know
which is better.

Reply via email to