https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #5) > If we had a way to generate XImode directly from 4 V16QI, and only generate > one move statement, then the register allocator would act better. That or split the XI register move to do 4 V16QI/V4SI and only the final move we generate the subreg. I think this later one is the best option really, and that lower-subreg.c pass should be doing but is not for some reason ....