[Bug middle-end/88873] missing vectorization for decomposed operations on a vector type

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 21 Jun 2023 06:33:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |linkw at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org,
                   |                            |sayle at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org
          Component|tree-optimization           |middle-end
           Keywords|                            |ra

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
So we "like"

v2df bar (v2df a, v2df b, v2df c)
{
  vector(2) double vect__4.19;
  vect__4.19_19 = .FMA (b_10(D), a_11(D), c_9(D)); [tail call]
  return vect__4.19_19;
}

but foo has the usual ABI issues:

struct s_t foo (struct s_t a, struct s_t b, struct s_t c)
{
  vector(2) double vect__4.13;
  vector(2) double vect__1.12;
  vector(2) double vect__3.9; 
  vector(2) double vect__2.6;
  struct s_t D.4355;
  vect__1.12_14 = MEM <vector(2) double> [(double *)&c];
  vect__2.6_12 = MEM <vector(2) double> [(double *)&b];
  vect__3.9_13 = MEM <vector(2) double> [(double *)&a];
  vect__4.13_15 = .FMA (vect__2.6_12, vect__3.9_13, vect__1.12_14);
  MEM <vector(2) double> [(double *)&D.4355] = vect__4.13_15;
  return D.4355;
}

where the argument passing / return value handling gets us

foo:
        vmovq   %xmm3, %rax
        vmovq   %xmm0, -24(%rsp)
        vpinsrq $1, %rax, %xmm2, %xmm7
        vmovq   %xmm5, %rax
        vmovq   %xmm1, -16(%rsp)
        vmovapd %xmm7, %xmm6
        vpinsrq $1, %rax, %xmm4, %xmm2
        vmovq   %xmm4, -40(%rsp)
        vfmadd132pd     -24(%rsp), %xmm2, %xmm6
        vmovq   %xmm5, -32(%rsp)
        vmovapd %xmm6, -56(%rsp)
        vmovsd  -48(%rsp), %xmm1
        vmovsd  -56(%rsp), %xmm0
        ret

that's very weird, we also seem to half-way clean up things but fail to
eliminate the useless vmovq   %xmm5, -32(%rsp) spill for example.

The IBM folks who want to use SRA-style analysis at RTL expansion time
might in the end deal with this as well.

We expand to

(insn 2 21 3 2 (set (reg:DF 91)
        (reg:DF 20 xmm0 [ a ])) "t2.c":8:1 -1
     (nil))
(insn 3 2 4 2 (set (reg:DF 92)
        (reg:DF 21 xmm1 [ a+8 ])) "t2.c":8:1 -1
     (nil))
(insn 4 3 5 2 (set (reg:TI 90)
        (const_int 0 [0])) "t2.c":8:1 -1
     (nil))
(insn 5 4 6 2 (set (subreg:DF (reg:TI 90) 0)
        (reg:DF 91)) "t2.c":8:1 -1
     (nil))
(insn 6 5 7 2 (set (subreg:DF (reg:TI 90) 8)
        (reg:DF 92)) "t2.c":8:1 -1
     (nil))

so we're using TImode pseudos because the aggregate has TImode but the
accesses should tell us that V2DFmode would be a way better choice
(or alternatively V2DImode in case float modes are too dangerous).

The actual single use is then

(insn 23 20 24 2 (set (reg:V2DF 85 [ vect__4.13 ])
        (fma:V2DF (subreg:V2DF (reg/v:TI 93 [ b ]) 0)
            (subreg:V2DF (reg/v:TI 89 [ a ]) 0)
            (subreg:V2DF (reg/v:TI 97 [ c ]) 0))) "t2.c":9:18 -1
     (nil))

and of course IRA/LRA are not able to deal with this situation nicely,
possibly because of the subreg sets of the TImode pseudo which we
do not split (well, we can't).  We could possibly use STV to handle
some of this though(?)

[Bug middle-end/88873] missing vectorization for decomposed operations on a vector type

Reply via email to