https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116256
--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> --- So what's left here is the dup-{1,2,3} cases. IMHO this all ties back to the constant synthesis problem. They can be fixed by removing the mvconst_internal pattern -- but that leads to a new set of regressions which don't look tractable to solve. A great example would be and-shift32.c, but there are others. The fundmantal problem is exposing synthesis to combine means combine has to look at more instructions and it's limited in its search depth to 4. The mvconst_internal pattern is acting like a bridge to allow other combine patterns to trigger. In and-shift32.c we would need 5 insn combination support to bring together all the necessary insns to optimize that case without mvconst_internal. combine doesn't handle REG_EQUAL notes well and fixing that looks fairly painful. Basically I don't see a path right now to remove mvconst_internal without significant combiner surgery, worse yet, that surgery will hit the REG_DEAD note distribution code which is one of the hairier parts of combine. Trying to tackle the dup-{1,2,3} cases after reload is doomed to failure IMHO because we re-use the output regsiter from synthesis as a scratch in the synthesis sequence. That inhibits the post-reload optimizers significantly, primarily reload_cse, but also vsetvl optimization for larger vector lengths. I think all that tends to argue that a local cprop (but not full cse) pass may be the only path forward here, which I'll explore next.