https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |linkw at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org,
| |sayle at gcc dot gnu.org,
| |vmakarov at gcc dot gnu.org
Component|tree-optimization |middle-end
Keywords| |ra
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
So we "like"
v2df bar (v2df a, v2df b, v2df c)
{
vector(2) double vect__4.19;
vect__4.19_19 = .FMA (b_10(D), a_11(D), c_9(D)); [tail call]
return vect__4.19_19;
}
but foo has the usual ABI issues:
struct s_t foo (struct s_t a, struct s_t b, struct s_t c)
{
vector(2) double vect__4.13;
vector(2) double vect__1.12;
vector(2) double vect__3.9;
vector(2) double vect__2.6;
struct s_t D.4355;
vect__1.12_14 = MEM <vector(2) double> [(double *)&c];
vect__2.6_12 = MEM <vector(2) double> [(double *)&b];
vect__3.9_13 = MEM <vector(2) double> [(double *)&a];
vect__4.13_15 = .FMA (vect__2.6_12, vect__3.9_13, vect__1.12_14);
MEM <vector(2) double> [(double *)&D.4355] = vect__4.13_15;
return D.4355;
}
where the argument passing / return value handling gets us
foo:
vmovq %xmm3, %rax
vmovq %xmm0, -24(%rsp)
vpinsrq $1, %rax, %xmm2, %xmm7
vmovq %xmm5, %rax
vmovq %xmm1, -16(%rsp)
vmovapd %xmm7, %xmm6
vpinsrq $1, %rax, %xmm4, %xmm2
vmovq %xmm4, -40(%rsp)
vfmadd132pd -24(%rsp), %xmm2, %xmm6
vmovq %xmm5, -32(%rsp)
vmovapd %xmm6, -56(%rsp)
vmovsd -48(%rsp), %xmm1
vmovsd -56(%rsp), %xmm0
ret
that's very weird, we also seem to half-way clean up things but fail to
eliminate the useless vmovq %xmm5, -32(%rsp) spill for example.
The IBM folks who want to use SRA-style analysis at RTL expansion time
might in the end deal with this as well.
We expand to
(insn 2 21 3 2 (set (reg:DF 91)
(reg:DF 20 xmm0 [ a ])) "t2.c":8:1 -1
(nil))
(insn 3 2 4 2 (set (reg:DF 92)
(reg:DF 21 xmm1 [ a+8 ])) "t2.c":8:1 -1
(nil))
(insn 4 3 5 2 (set (reg:TI 90)
(const_int 0 [0])) "t2.c":8:1 -1
(nil))
(insn 5 4 6 2 (set (subreg:DF (reg:TI 90) 0)
(reg:DF 91)) "t2.c":8:1 -1
(nil))
(insn 6 5 7 2 (set (subreg:DF (reg:TI 90) 8)
(reg:DF 92)) "t2.c":8:1 -1
(nil))
so we're using TImode pseudos because the aggregate has TImode but the
accesses should tell us that V2DFmode would be a way better choice
(or alternatively V2DImode in case float modes are too dangerous).
The actual single use is then
(insn 23 20 24 2 (set (reg:V2DF 85 [ vect__4.13 ])
(fma:V2DF (subreg:V2DF (reg/v:TI 93 [ b ]) 0)
(subreg:V2DF (reg/v:TI 89 [ a ]) 0)
(subreg:V2DF (reg/v:TI 97 [ c ]) 0))) "t2.c":9:18 -1
(nil))
and of course IRA/LRA are not able to deal with this situation nicely,
possibly because of the subreg sets of the TImode pseudo which we
do not split (well, we can't). We could possibly use STV to handle
some of this though(?)