9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure

jakub at gcc dot gnu.org Wed, 27 Mar 2019 04:16:22 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101


--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> Just looking at what we feed combine:
> 
> (insn 9 8 15 2 (set (reg:V2DI 89)
>         (vec_concat:V2DI (reg:DI 90 [ num ])
>             (reg:DI 92))) "t.c":9:12 4182 {vec_concatv2di}
>      (expr_list:REG_DEAD (reg:DI 92)
>         (expr_list:REG_DEAD (reg:DI 90 [ num ])
>             (nil))))
> (insn 15 9 16 2 (set (reg/i:TI 0 ax)
>         (subreg:TI (reg:V2DI 89) 0)) "t.c":10:1 65 {*movti_internal}
>      (expr_list:REG_DEAD (reg:V2DI 89)
>         (nil)))
> 
> I wonder why we can't "simplify" this into individual sets of the
> hardreg pair?  fwprop sees the same thing so that's another possible
> fixing point.  Not sure if the backend in the end would like to
> see the above TImode set decomposed though...

We surely want to decompose it in these testcases.  The big question is find
out in which pass to do that (which has a reasonable infrastructure), what cost
and what not to check etc.  The testcase show something that is clearly
undesirable without any costs, vec_concating scalar regs into a vector only to
subreg it into a scalar hard reg...  But now, if it wasn't into a GPR reg, but
just TImode in some pseudo that it would be beneficial to reload into a vector
reg and then operate in vector reg, it wouldn't be a win.  On the other side,
if we don't get rid of those vector modes before reload, RA will choose vector
registers for those.

[Bug rtl-optimization/84101] [7/8/9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure

Reply via email to