https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83827

            Bug ID: 83827
           Summary: vector load/store with struct in registers
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

(first seen with complex numbers, quite likely a dup of some other vector PR,
could be target instead of rtl-opt)

typedef double vec __attribute__((vector_size(16)));
struct A { double a, b; };
vec f(A x){
  vec v = { x.a, x.b };
  return v;
}
A add(A x, A y){
  return { x.a+y.a, x.b+y.b };
}

In f, we build v with
  _1 = x.a;
  _2 = x.b;
  v_4 = {_1, _2};
while in add, we do
  vect__1.2_10 = MEM[(double *)&x];
  vect__2.5_11 = MEM[(double *)&y];
  vect__3.6_12 = vect__1.2_10 + vect__2.5_11;
  MEM[(double *)&D.2881] = vect__3.6_12;

The first version yields (g++ -O3 -march=skylake) the nice
        vunpcklpd       %xmm1, %xmm0, %xmm0
while add has the lengthy
        vmovq   %xmm0, -40(%rsp)
        vmovq   %xmm1, -32(%rsp)
        vmovapd -40(%rsp), %xmm5
        vmovq   %xmm2, -24(%rsp)
        vmovq   %xmm3, -16(%rsp)
        vaddpd  -24(%rsp), %xmm5, %xmm4
        vmovaps %xmm4, -40(%rsp)
        vmovsd  -32(%rsp), %xmm1
        vmovsd  -40(%rsp), %xmm0

With -O2, we also turn
A g(vec x){
  return { x[0], x[1] };
}
into the nice
        vunpckhpd       %xmm0, %xmm0, %xmm1


(this PR is independent of whether it was a good idea or not for SLP to
vectorize add)

Reply via email to