https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #2) > You sink the conversion, so it would be PRE on the reverse graph. The > transform doesn't really fit a particular pass I think. > > Why does the problem persist in RTL? Normally, combine will eliminate the redudant move by combine subreg to the pattern like. 1004(insn 19 17 21 3 (set (subreg:V4SI (reg/v:V2DI 103 [ vsum ]) 0) 1005 (unspec:V4SI [ 1006 (subreg:V4SI (reg/v:V2DI 103 [ vsum ]) 0) 1007 (reg:V4SI 123 [ MEM[(__m128i * {ref-all})_52] ]) 1008 (reg:V4SI 124) 1009 ] UNSPEC_VPDPBUSD)) "test.c":9:16 discrim 1 9182 {vpdpbusd_v4si} but for this case, before combine, cse1/fwprop propagate the subreg(insn 21) from inner loop to outside(insn 28), since there's use for (reg:V4SI 121), combine failed to eliminate the redudnat mov of subreg. ------loop_begin---------- ... (insn 19 18 20 3 (set (reg:V4SI 121) 393 (unspec:V4SI [ 394 (reg:V4SI 122 [ vsum ]) 395 (reg:V4SI 123 [ MEM[(__m128i * {ref-all})_52] ]) 396 (reg:V4SI 124) 397 ] UNSPEC_VPDPBUSD)) "test.c":9:16 discrim 1 9182 {vpdpbusd_v4si} 398 (expr_list:REG_DEAD (reg:V4SI 125) 399 (expr_list:REG_DEAD (reg:V4SI 123 [ MEM[(__m128i * {ref-all})_52] ]) 400 (expr_list:REG_DEAD (reg:V4SI 122 [ vsum ]) 401 (nil))))) 402(insn 20 19 21 3 (set (reg:V4SI 102 [ _11 ]) 403 (reg:V4SI 121)) "test.c":9:16 discrim 1 1906 {movv4si_internal} 404 (expr_list:REG_DEAD (reg:V4SI 121) 405 (nil))) 406(insn 21 20 22 3 (set (reg/v:V2DI 103 [ vsum ]) 407 (subreg:V2DI (reg:V4SI 121) 0)) "test.c":9:16 discrim 2 1909 {movv2di_internal} 408 (nil)) ... ---------loop_end--------- 453(note 27 26 28 4 [bb 4] NOTE_INSN_BASIC_BLOCK) 454(insn 28 27 29 4 (set (mem:V2DI (reg/v/f:DI 119 [ pc ]) [0 *pc_22(D)+0 S16 A128]) 455 (subreg:V2DI (reg:V4SI 121) 0)) "test.c":11:9 1909 {movv2di_internal} 456 (expr_list:REG_DEAD (reg/v/f:DI 119 [ pc ]) --- propogate from insn 21 457 (expr_list:REG_DEAD (reg/v:V2DI 103 [ vsum ]) 458 (nil))))