https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753
--- Comment #4 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #3) > (In reply to Wilco from comment #2) > > (In reply to Andrew Pinski from comment #1) > > > lower-subreg should have be able to help here. I wonder why it did not > > > ... > > > > I'm not sure how it can help. > > I think you misunderstood what this pass does. > It does exactly what you think it should do: > /* Decompose multi-word pseudo-registers into individual > pseudo-registers when possible and profitable. This is possible > when all the uses of a multi-word register are via SUBREG, or are > copies of the register to another location. Breaking apart the > register permits more CSE and permits better register allocation. > > The only difference is the creating part which missing. Yes but the issue is that you can't remove all the subregs since the TBX instructions really need a 512-bit register. The slim dump for x1 = vqtbx4q_u8(x1, table,x1): 30: r94:XI#0=r105:V16QI 31: r95:XI=r94:XI REG_DEAD r94:XI 32: r95:XI#16=r101:V16QI 33: r96:XI=r95:XI REG_DEAD r95:XI 34: r96:XI#32=r102:V16QI 35: r97:XI=r96:XI REG_DEAD r96:XI 36: r97:XI#48=r106:V16QI 38: r100:V16QI=unspec[r100:V16QI,r97:XI,r100:V16QI] 186 REG_DEAD r97:XI As you can see it creates the 512-bit XI register via a complex sequence of 4 subreg lvalues.