https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |linkw at gcc dot gnu.org, | |rguenth at gcc dot gnu.org, | |sayle at gcc dot gnu.org, | |vmakarov at gcc dot gnu.org Component|tree-optimization |middle-end Keywords| |ra --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- So we "like" v2df bar (v2df a, v2df b, v2df c) { vector(2) double vect__4.19; vect__4.19_19 = .FMA (b_10(D), a_11(D), c_9(D)); [tail call] return vect__4.19_19; } but foo has the usual ABI issues: struct s_t foo (struct s_t a, struct s_t b, struct s_t c) { vector(2) double vect__4.13; vector(2) double vect__1.12; vector(2) double vect__3.9; vector(2) double vect__2.6; struct s_t D.4355; vect__1.12_14 = MEM <vector(2) double> [(double *)&c]; vect__2.6_12 = MEM <vector(2) double> [(double *)&b]; vect__3.9_13 = MEM <vector(2) double> [(double *)&a]; vect__4.13_15 = .FMA (vect__2.6_12, vect__3.9_13, vect__1.12_14); MEM <vector(2) double> [(double *)&D.4355] = vect__4.13_15; return D.4355; } where the argument passing / return value handling gets us foo: vmovq %xmm3, %rax vmovq %xmm0, -24(%rsp) vpinsrq $1, %rax, %xmm2, %xmm7 vmovq %xmm5, %rax vmovq %xmm1, -16(%rsp) vmovapd %xmm7, %xmm6 vpinsrq $1, %rax, %xmm4, %xmm2 vmovq %xmm4, -40(%rsp) vfmadd132pd -24(%rsp), %xmm2, %xmm6 vmovq %xmm5, -32(%rsp) vmovapd %xmm6, -56(%rsp) vmovsd -48(%rsp), %xmm1 vmovsd -56(%rsp), %xmm0 ret that's very weird, we also seem to half-way clean up things but fail to eliminate the useless vmovq %xmm5, -32(%rsp) spill for example. The IBM folks who want to use SRA-style analysis at RTL expansion time might in the end deal with this as well. We expand to (insn 2 21 3 2 (set (reg:DF 91) (reg:DF 20 xmm0 [ a ])) "t2.c":8:1 -1 (nil)) (insn 3 2 4 2 (set (reg:DF 92) (reg:DF 21 xmm1 [ a+8 ])) "t2.c":8:1 -1 (nil)) (insn 4 3 5 2 (set (reg:TI 90) (const_int 0 [0])) "t2.c":8:1 -1 (nil)) (insn 5 4 6 2 (set (subreg:DF (reg:TI 90) 0) (reg:DF 91)) "t2.c":8:1 -1 (nil)) (insn 6 5 7 2 (set (subreg:DF (reg:TI 90) 8) (reg:DF 92)) "t2.c":8:1 -1 (nil)) so we're using TImode pseudos because the aggregate has TImode but the accesses should tell us that V2DFmode would be a way better choice (or alternatively V2DImode in case float modes are too dangerous). The actual single use is then (insn 23 20 24 2 (set (reg:V2DF 85 [ vect__4.13 ]) (fma:V2DF (subreg:V2DF (reg/v:TI 93 [ b ]) 0) (subreg:V2DF (reg/v:TI 89 [ a ]) 0) (subreg:V2DF (reg/v:TI 97 [ c ]) 0))) "t2.c":9:18 -1 (nil)) and of course IRA/LRA are not able to deal with this situation nicely, possibly because of the subreg sets of the TImode pseudo which we do not split (well, we can't). We could possibly use STV to handle some of this though(?)