https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753
--- Comment #4 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Wilco from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > lower-subreg should have be able to help here. I wonder why it did not
> > > ...
> >
> > I'm not sure how it can help.
>
> I think you misunderstood what this pass does.
> It does exactly what you think it should do:
> /* Decompose multi-word pseudo-registers into individual
> pseudo-registers when possible and profitable. This is possible
> when all the uses of a multi-word register are via SUBREG, or are
> copies of the register to another location. Breaking apart the
> register permits more CSE and permits better register allocation.
>
> The only difference is the creating part which missing.
Yes but the issue is that you can't remove all the subregs since the TBX
instructions really need a 512-bit register. The slim dump for x1 =
vqtbx4q_u8(x1, table,x1):
30: r94:XI#0=r105:V16QI
31: r95:XI=r94:XI
REG_DEAD r94:XI
32: r95:XI#16=r101:V16QI
33: r96:XI=r95:XI
REG_DEAD r95:XI
34: r96:XI#32=r102:V16QI
35: r97:XI=r96:XI
REG_DEAD r96:XI
36: r97:XI#48=r106:V16QI
38: r100:V16QI=unspec[r100:V16QI,r97:XI,r100:V16QI] 186
REG_DEAD r97:XI
As you can see it creates the 512-bit XI register via a complex sequence of 4
subreg lvalues.