https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64477

--- Comment #4 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
It is hard for me to consider the PR RA fault.  The pseudo 90 gets memory as
its cost 3000 for DIREG (4000 for any general reg) and 2000 for memory.

    2: r90:SI=di:SI
      REG_DEAD di:SI
    8: r95:V4SI=vec_merge(vec_duplicate(r90:SI),const_vector,0x1)

Cost 4000 for GENERAL_REGS is a result of description of insn 8:

(define_insn "vec_set<mode>_0"
  [(set (match_operand:VI4F_128 0 "nonimmediate_operand"
          "=Yr,*v,v,v ,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
        (vec_merge:VI4F_128
          (vec_duplicate:VI4F_128
            (match_operand:<ssescalarmode> 2 "general_operand"
          " Yr,*v,m,*r,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
          (match_operand:VI4F_128 1 "vector_move_operand"
          " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
          (const_int 1)))]
  "TARGET_SSE"
  "@
   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
   %vmov<ssescalarmodesuffix>\t{%2, %0|%0, %2}
   %vmovd\t{%2, %0|%0, %2}
   movss\t{%2, %0|%0, %2}
   movss\t{%2, %0|%0, %2}
   vmovss\t{%2, %1, %0|%0, %1, %2}
   pinsrd\t{$0, %2, %0|%0, %2, 0}
   pinsrd\t{$0, %2, %0|%0, %2, 0}
   vpinsrd\t{$0, %2, %1, %0|%0, %1, %2, 0}
   #
   #
   #"

The description uses too many * which excludes r (in operand 2 of alt#3) from
consideration for choosing pseudo class.  The corresponding insn movd is
perfectly fine.  I believe * should be used in rare cases.  If you want to
disfavor alternative, e.g. because the corresponding insn is costly or it needs
more one insn (that is usually a split case), it is better to use '?' for this.
 It is better not to exclude constraints (reg classes), let RA chooses the
class itself based on costs.  Without the change, RA calculates the cost of
GENERAL_REGS for pseudo 90 (2nd operand) by moving it through memory.

So the following change solves the problem (* before 'r' is removed from 3rd
alt).

" Yr,*v,m,r,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"

I'd also change * before 'v' onto '?' because we could choose v (all SSE regs)
instead of Yr (SSE regs without usage an additional prefix in the insn) and
still disparage them as v results in longer insn.

The only problem is that I guess there are a lot of such insn definitions and
analogous problem might be in future.

The first change was bootstrapped successfully with --with-cpu=core-avx2 and
--with-arch=core-avx2.

Reply via email to