https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108141

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
[...]
... From this POV I think r13-4727 is actually a step backwards
> because previously we were at least loading it into GPR, moving to SSE and
> broadcasting there,
> while now we move into GPR, spill to memory and broadcast from memory.
> Before combine we have:
> (insn 2 8 3 2 (set (reg:SI 120 [ x ])
>         (mem/c:SI (reg/f:SI 16 argp) [2 x+0 S4 A32])) "pr64110.c":11:1 83
> {*movsi_internal}
>      (nil))
> (insn 3 2 4 2 (set (reg/v:HI 119 [ x ])
>         (subreg:HI (reg:SI 120 [ x ]) 0)) "pr64110.c":11:1 84
> {*movhi_internal}
>      (expr_list:REG_DEAD (reg:SI 120 [ x ])
>         (nil)))
> ...
> and in another bb
> (insn 63 140 35 3 (set (reg:V8HI 140)
>         (vec_duplicate:V8HI (reg/v:HI 119 [ x ]))) "pr64110.c":16:7 7985
> {*vec_dupv8hi}
>      (nil))
> (insn 35 63 18 3 (set (reg:V16HI 141 [ vect_cst__52 ])
>         (vec_duplicate:V16HI (reg/v:HI 119 [ x ]))) 7984 {*vec_dupv16hi}
>      (nil))
> so I bet that is the reason why combine doesn't merge those into just the
> broadcast.

Yep.  And probably fwprop doesnt consider MEMs (or even two defs) at all.
I suppose we don't want to combine insn 2 + 3 into a HImode MEM by itself?
OTOH there's no fwprop after combine.

> As for the xmm vs. ymm, it is only loop-invariant that moves those 2 dups
> (insn 63 and 35) next to each other, and the question is what kind of
> optimization pass could figure out that insn 35 is a superset of insn 63 and
> change it into insn 35 + lowpart subreg to set pseudo 140 from low half of
> 141.

There's only a peephole or alternatively scheduling heuristic + CSE (we
need the V16HI duplicate before the V8HI one) I can think of.

CSE could also tentatively record "larger" computations and modify the
earlier stmt if uses of that larger compute appears.

Reply via email to