https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
a simple c testcase

typedef struct
{
  unsigned char* p;
  unsigned int a;
}st;

st foo (unsigned char* p, unsigned char* q)
{
  return {p, (unsigned int)(q-p)};
}


There's two issues here.
1. gcc use memory to move from xmm to gpr.
---
        vmovdqa XMMWORD PTR [rsp-24], xmm0
        mov     rax, QWORD PTR [rsp-24]
        mov     rdx, QWORD PTR [rsp-16]
---

2. gcc use vpinsrd to initialize st.a which is suboptimal after reload.

(insn 9 24 23 2 (set (reg:V4SI 20 xmm0 [89])
        (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 4 si [88]))
            (reg:V4SI 21 xmm1 [94])
            (const_int 4 [0x4]))) "../test.c":9:42 4387 {sse4_1_pinsrd}
     (nil))

Reply via email to