https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68961
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- Btw, the testcase seems to be really "special" given exact register overlap between return value and incoming args (if you'd look at the vectorizers choice to say this is profitable to vectorize). Making it fairer like with long double pack (double x, double a, double aa) { union u_ld u; u.d[0] = a; u.d[1] = aa; return u.ld; } produces without SLP pack: mfvsrd 10,2 fmr 2,3 mtvsrd 1,10 blr and with pack: xxpermdi 0,3,2,0 addi 9,1,-16 xxpermdi 0,0,0,2 stxvd2x 0,0,9 lfd 1,-16(1) lfd 2,-8(1) blr to that would be the thing to compare cost-wise. Currently we have t.c:9:11: note: Cost model analysis: Vector inside of basic block cost: 1 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar cost of basic block: 2 so for some reason the vector build is not accounted for. Ah, I see why. Mine.