https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 14 Oct 2019, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > > --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Yeah, it isn't e.g. something RTL CSE would naturally do, because there is no > common subexpression, this needs to know that a narrower broadcast is a part > of > a wider broadcast of the same argument and know how to replace that with a > backend instruction that takes the low bits from it (while it actually usually > expands to no code, at least before RA it needs to be expressed some way and > is > very backend specific, we don't allow a vector mode to vector mode subreg with > different size). So the only place to deal with this in RTL would be some > backend specific pass I'm afraid. So what RTL CSE would need to do is when seeing (set reg:VNQI ...) know (via a target hook?) which subregs can be accessed at zero-cost and register the apropriate smaller vector sets with a subreg value. That probably makes sense only after reload to not constrain RA too much. It could be restricted to vec_duplicate since there it's easy to derive the lowpart expression to register.