https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
a simple c testcase
typedef struct
{
unsigned char* p;
unsigned int a;
}st;
st foo (unsigned char* p, unsigned char* q)
{
return {p, (unsigned int)(q-p)};
}
There's two issues here.
1. gcc use memory to move from xmm to gpr.
---
vmovdqa XMMWORD PTR [rsp-24], xmm0
mov rax, QWORD PTR [rsp-24]
mov rdx, QWORD PTR [rsp-16]
---
2. gcc use vpinsrd to initialize st.a which is suboptimal after reload.
(insn 9 24 23 2 (set (reg:V4SI 20 xmm0 [89])
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 4 si [88]))
(reg:V4SI 21 xmm1 [94])
(const_int 4 [0x4]))) "../test.c":9:42 4387 {sse4_1_pinsrd}
(nil))