https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101
--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Richard Biener from comment #14) > Just looking at what we feed combine: > > (insn 9 8 15 2 (set (reg:V2DI 89) > (vec_concat:V2DI (reg:DI 90 [ num ]) > (reg:DI 92))) "t.c":9:12 4182 {vec_concatv2di} > (expr_list:REG_DEAD (reg:DI 92) > (expr_list:REG_DEAD (reg:DI 90 [ num ]) > (nil)))) > (insn 15 9 16 2 (set (reg/i:TI 0 ax) > (subreg:TI (reg:V2DI 89) 0)) "t.c":10:1 65 {*movti_internal} > (expr_list:REG_DEAD (reg:V2DI 89) > (nil))) > > I wonder why we can't "simplify" this into individual sets of the > hardreg pair? fwprop sees the same thing so that's another possible > fixing point. Not sure if the backend in the end would like to > see the above TImode set decomposed though... We surely want to decompose it in these testcases. The big question is find out in which pass to do that (which has a reasonable infrastructure), what cost and what not to check etc. The testcase show something that is clearly undesirable without any costs, vec_concating scalar regs into a vector only to subreg it into a scalar hard reg... But now, if it wasn't into a GPR reg, but just TImode in some pseudo that it would be beneficial to reload into a vector reg and then operate in vector reg, it wouldn't be a win. On the other side, if we don't get rid of those vector modes before reload, RA will choose vector registers for those.