https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93897
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
Known to fail| |10.0, 7.5.0, 8.3.1, 9.2.1
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're "correctly" costing an extra spill and the two loads:
t.C:10:24: note: vect_model_store_cost: inside_cost = 16, prologue_cost = 40
.
0x59cc2a0 y_4(D) 1 times vec_construct costs 8 in prologue
0x59cc2a0 y_4(D) 1 times vector_store costs 16 in body
0x59cc2a0 y_4(D) 1 times vector_store costs 16 in epilogue
0x59cc2a0 y_4(D) 2 times scalar_load costs 16 in epilogue
0x59f1130 y_4(D) 1 times scalar_store costs 12 in body
0x59f1130 z_6(D) 1 times scalar_store costs 12 in body
t.C:10:24: note: Cost model analysis:
Vector inside of basic block cost: 16
Vector prologue cost: 8
Vector epilogue cost: 32
Scalar cost of basic block: 24
t.C:10:24: missed: not vectorized: vectorization is not profitable.
and expand from
<bb 2> [local count: 1073741824]:
D.2953.x = x_2(D);
D.2953.y = y_4(D);
D.2953.z = z_6(D);
return D.2953;
but somehow RTL expansion ends up doing
;; Generating RTL for gimple basic block 2
;; D.2953.x = x_2(D);
(insn 8 7 0 (set (subreg:DI (reg:TI 82 [ D.2953 ]) 0)
(reg/v:DI 84 [ x ])) "t.C":10:24 -1
(nil))
;; D.2953.y = y_4(D);
(insn 9 8 10 (set (reg:V4SI 87)
(vec_merge:V4SI (vec_duplicate:V4SI (reg/v:SI 85 [ y ]))
(subreg:V4SI (reg:TI 82 [ D.2953 ]) 0)
(const_int 4 [0x4]))) "t.C":10:24 -1
(nil))
(insn 10 9 0 (set (reg:TI 82 [ D.2953 ])
(subreg:TI (reg:V4SI 87) 0)) "t.C":10:24 -1
(nil))
;; D.2953.z = z_6(D);
(insn 11 10 12 (set (reg:V4SI 88)
(vec_merge:V4SI (vec_duplicate:V4SI (reg/v:SI 86 [ z ]))
(subreg:V4SI (reg:TI 82 [ D.2953 ]) 0)
(const_int 8 [0x8]))) "t.C":10:24 -1
(nil))
(insn 12 11 0 (set (reg:TI 82 [ D.2953 ])
(subreg:TI (reg:V4SI 88) 0)) "t.C":10:24 -1
(nil))
;; return D.2953;
!?