https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93897
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Known to fail| |10.0, 7.5.0, 8.3.1, 9.2.1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- We're "correctly" costing an extra spill and the two loads: t.C:10:24: note: vect_model_store_cost: inside_cost = 16, prologue_cost = 40 . 0x59cc2a0 y_4(D) 1 times vec_construct costs 8 in prologue 0x59cc2a0 y_4(D) 1 times vector_store costs 16 in body 0x59cc2a0 y_4(D) 1 times vector_store costs 16 in epilogue 0x59cc2a0 y_4(D) 2 times scalar_load costs 16 in epilogue 0x59f1130 y_4(D) 1 times scalar_store costs 12 in body 0x59f1130 z_6(D) 1 times scalar_store costs 12 in body t.C:10:24: note: Cost model analysis: Vector inside of basic block cost: 16 Vector prologue cost: 8 Vector epilogue cost: 32 Scalar cost of basic block: 24 t.C:10:24: missed: not vectorized: vectorization is not profitable. and expand from <bb 2> [local count: 1073741824]: D.2953.x = x_2(D); D.2953.y = y_4(D); D.2953.z = z_6(D); return D.2953; but somehow RTL expansion ends up doing ;; Generating RTL for gimple basic block 2 ;; D.2953.x = x_2(D); (insn 8 7 0 (set (subreg:DI (reg:TI 82 [ D.2953 ]) 0) (reg/v:DI 84 [ x ])) "t.C":10:24 -1 (nil)) ;; D.2953.y = y_4(D); (insn 9 8 10 (set (reg:V4SI 87) (vec_merge:V4SI (vec_duplicate:V4SI (reg/v:SI 85 [ y ])) (subreg:V4SI (reg:TI 82 [ D.2953 ]) 0) (const_int 4 [0x4]))) "t.C":10:24 -1 (nil)) (insn 10 9 0 (set (reg:TI 82 [ D.2953 ]) (subreg:TI (reg:V4SI 87) 0)) "t.C":10:24 -1 (nil)) ;; D.2953.z = z_6(D); (insn 11 10 12 (set (reg:V4SI 88) (vec_merge:V4SI (vec_duplicate:V4SI (reg/v:SI 86 [ z ])) (subreg:V4SI (reg:TI 82 [ D.2953 ]) 0) (const_int 8 [0x8]))) "t.C":10:24 -1 (nil)) (insn 12 11 0 (set (reg:TI 82 [ D.2953 ]) (subreg:TI (reg:V4SI 88) 0)) "t.C":10:24 -1 (nil)) ;; return D.2953; !?