https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 14 Oct 2019, rguenther at suse dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 > > --- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- > On Mon, 14 Oct 2019, jakub at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 > > > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > CC| |jakub at gcc dot gnu.org > > > > --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > > Yeah, it isn't e.g. something RTL CSE would naturally do, because there is > > no > > common subexpression, this needs to know that a narrower broadcast is a > > part of > > a wider broadcast of the same argument and know how to replace that with a > > backend instruction that takes the low bits from it (while it actually > > usually > > expands to no code, at least before RA it needs to be expressed some way > > and is > > very backend specific, we don't allow a vector mode to vector mode subreg > > with > > different size). So the only place to deal with this in RTL would be some > > backend specific pass I'm afraid. > > So what RTL CSE would need to do is when seeing > > (set reg:VNQI ...) > > know (via a target hook?) which subregs can be accessed at zero-cost > and register the apropriate smaller vector sets with a subreg value. > That probably makes sense only after reload to not constrain RA > too much. It could be restricted to vec_duplicate since there > it's easy to derive the lowpart expression to register. Or IRA/LRA rematerialization / inheritance could be teached to do this.