https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116256

--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So what's left here is the dup-{1,2,3} cases.

IMHO this all ties back to the constant synthesis problem.  They can be fixed
by removing the mvconst_internal pattern -- but that leads to a new set of
regressions which don't look tractable to solve.

A great example would be and-shift32.c, but there are others.  The fundmantal
problem is exposing synthesis to combine means combine has to look at more
instructions and it's limited in its search depth to 4.  The mvconst_internal
pattern is acting like a bridge to allow other combine patterns to trigger.

In and-shift32.c we would need 5 insn combination support to bring together all
the necessary insns to optimize that case without mvconst_internal.    combine
doesn't handle REG_EQUAL notes well and fixing that looks fairly painful.

Basically I don't see a path right now to remove mvconst_internal without
significant combiner surgery, worse yet, that surgery will hit the REG_DEAD
note distribution code which is one of the hairier parts of combine.

Trying to tackle the dup-{1,2,3} cases after reload is doomed to failure IMHO
because we re-use the output regsiter from synthesis as a scratch in the
synthesis sequence.  That inhibits the post-reload optimizers significantly,
primarily reload_cse, but also vsetvl optimization for larger vector lengths.

I think all that tends to argue that a local cprop (but not full cse) pass may
be the only path forward here, which I'll explore next.

Reply via email to