https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83827
Bug ID: 83827 Summary: vector load/store with struct in registers Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* (first seen with complex numbers, quite likely a dup of some other vector PR, could be target instead of rtl-opt) typedef double vec __attribute__((vector_size(16))); struct A { double a, b; }; vec f(A x){ vec v = { x.a, x.b }; return v; } A add(A x, A y){ return { x.a+y.a, x.b+y.b }; } In f, we build v with _1 = x.a; _2 = x.b; v_4 = {_1, _2}; while in add, we do vect__1.2_10 = MEM[(double *)&x]; vect__2.5_11 = MEM[(double *)&y]; vect__3.6_12 = vect__1.2_10 + vect__2.5_11; MEM[(double *)&D.2881] = vect__3.6_12; The first version yields (g++ -O3 -march=skylake) the nice vunpcklpd %xmm1, %xmm0, %xmm0 while add has the lengthy vmovq %xmm0, -40(%rsp) vmovq %xmm1, -32(%rsp) vmovapd -40(%rsp), %xmm5 vmovq %xmm2, -24(%rsp) vmovq %xmm3, -16(%rsp) vaddpd -24(%rsp), %xmm5, %xmm4 vmovaps %xmm4, -40(%rsp) vmovsd -32(%rsp), %xmm1 vmovsd -40(%rsp), %xmm0 With -O2, we also turn A g(vec x){ return { x[0], x[1] }; } into the nice vunpckhpd %xmm0, %xmm0, %xmm1 (this PR is independent of whether it was a good idea or not for SLP to vectorize add)