9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure

rguenth at gcc dot gnu.org Wed, 27 Mar 2019 04:37:08 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101


--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #15)
> (In reply to Richard Biener from comment #14)
> > Just looking at what we feed combine:
> > 
> > (insn 9 8 15 2 (set (reg:V2DI 89)
> >         (vec_concat:V2DI (reg:DI 90 [ num ])
> >             (reg:DI 92))) "t.c":9:12 4182 {vec_concatv2di}
> >      (expr_list:REG_DEAD (reg:DI 92)
> >         (expr_list:REG_DEAD (reg:DI 90 [ num ])
> >             (nil))))
> > (insn 15 9 16 2 (set (reg/i:TI 0 ax)
> >         (subreg:TI (reg:V2DI 89) 0)) "t.c":10:1 65 {*movti_internal}
> >      (expr_list:REG_DEAD (reg:V2DI 89)
> >         (nil)))
> > 
> > I wonder why we can't "simplify" this into individual sets of the
> > hardreg pair?  fwprop sees the same thing so that's another possible
> > fixing point.  Not sure if the backend in the end would like to
> > see the above TImode set decomposed though...
> 
> We surely want to decompose it in these testcases.  The big question is find
> out in which pass to do that (which has a reasonable infrastructure), what
> cost and what not to check etc.  The testcase show something that is clearly
> undesirable without any costs, vec_concating scalar regs into a vector only
> to subreg it into a scalar hard reg...  But now, if it wasn't into a GPR
> reg, but just TImode in some pseudo that it would be beneficial to reload
> into a vector reg and then operate in vector reg, it wouldn't be a win.  On
> the other side, if we don't get rid of those vector modes before reload, RA
> will choose vector registers for those.

I wonder if we should turn
  (subreg:TI (vec_concat:... ))
into
  (set (subreg:DI (reg:TI ... 0)))
  (set (subreg:DI (reg:TI ... 8)))
which is what we handle nicely it sems.  That means sth has to split
out the subreg into a separate instruction again or we need to make
fwprop1 not convert

(insn 9 8 10 2 (set (reg:V2DI 89)
        (vec_concat:V2DI (reg:DI 90)
            (reg:DI 92))) "t.c":9:12 4182 {vec_concatv2di}
     (nil))
(insn 10 9 11 2 (set (reg:TI 86 [ D.1921 ])
        (subreg:TI (reg:V2DI 89) 0)) "t.c":9:12 65 {*movti_internal}
     (nil))
(insn 11 10 15 2 (set (reg:TI 87 [ <retval> ])
        (reg:TI 86 [ D.1921 ])) "t.c":9:12 65 {*movti_internal}
     (nil))
(insn 15 11 16 2 (set (reg/i:TI 0 ax)
        (reg:TI 87 [ <retval> ])) "t.c":10:1 65 {*movti_internal}
     (nil))

into

(insn 9 8 15 2 (set (reg:V2DI 89)
        (vec_concat:V2DI (reg:DI 90 [ num ])
            (reg:DI 92))) "t.c":9:12 4182 {vec_concatv2di}
     (expr_list:REG_DEAD (reg:DI 92)
        (expr_list:REG_DEAD (reg:DI 90 [ num ])
            (nil))))
(insn 15 9 16 2 (set (reg/i:TI 0 ax)
        (subreg:TI (reg:V2DI 89) 0)) "t.c":10:1 65 {*movti_internal}
     (expr_list:REG_DEAD (reg:V2DI 89)
        (nil)))

but instead massage it into the above suggested form.

[Bug rtl-optimization/84101] [7/8/9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure

Reply via email to