https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #28 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #27)
> (In reply to Richard Biener from comment #26)
> > but that doesn't seem to match for some unknown reason. 
> Try this:

The latency problem with the original testcase is solved with:

(define_peephole2
  [(match_scratch:DI 3 "Yv")
   (set (match_operand:V2DI 0 "sse_reg_operand")
        (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
                         (match_operand:DI 2 "nonimmediate_gr_operand")))]
  ""
  [(set (match_dup 3) (match_dup 2))
   (set (match_dup 0)
        (vec_concat:V2DI (match_dup 1) (match_dup 3)))])

but I don't know if this transformation applies universally to all x86 targets.

Reply via email to