https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64477
--- Comment #4 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- It is hard for me to consider the PR RA fault. The pseudo 90 gets memory as its cost 3000 for DIREG (4000 for any general reg) and 2000 for memory. 2: r90:SI=di:SI REG_DEAD di:SI 8: r95:V4SI=vec_merge(vec_duplicate(r90:SI),const_vector,0x1) Cost 4000 for GENERAL_REGS is a result of description of insn 8: (define_insn "vec_set<mode>_0" [(set (match_operand:VI4F_128 0 "nonimmediate_operand" "=Yr,*v,v,v ,x,x,v,Yr ,*x ,x ,m ,m ,m") (vec_merge:VI4F_128 (vec_duplicate:VI4F_128 (match_operand:<ssescalarmode> 2 "general_operand" " Yr,*v,m,*r,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF")) (match_operand:VI4F_128 1 "vector_move_operand" " C , C,C,C ,C,0,v,0 ,0 ,x ,0 ,0 ,0") (const_int 1)))] "TARGET_SSE" "@ %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe} %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe} %vmov<ssescalarmodesuffix>\t{%2, %0|%0, %2} %vmovd\t{%2, %0|%0, %2} movss\t{%2, %0|%0, %2} movss\t{%2, %0|%0, %2} vmovss\t{%2, %1, %0|%0, %1, %2} pinsrd\t{$0, %2, %0|%0, %2, 0} pinsrd\t{$0, %2, %0|%0, %2, 0} vpinsrd\t{$0, %2, %1, %0|%0, %1, %2, 0} # # #" The description uses too many * which excludes r (in operand 2 of alt#3) from consideration for choosing pseudo class. The corresponding insn movd is perfectly fine. I believe * should be used in rare cases. If you want to disfavor alternative, e.g. because the corresponding insn is costly or it needs more one insn (that is usually a split case), it is better to use '?' for this. It is better not to exclude constraints (reg classes), let RA chooses the class itself based on costs. Without the change, RA calculates the cost of GENERAL_REGS for pseudo 90 (2nd operand) by moving it through memory. So the following change solves the problem (* before 'r' is removed from 3rd alt). " Yr,*v,m,r,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF" I'd also change * before 'v' onto '?' because we could choose v (all SSE regs) instead of Yr (SSE regs without usage an additional prefix in the insn) and still disparage them as v results in longer insn. The only problem is that I guess there are a lot of such insn definitions and analogous problem might be in future. The first change was bootstrapped successfully with --with-cpu=core-avx2 and --with-arch=core-avx2.