https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753

--- Comment #4 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Wilco from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > lower-subreg should have be able to help here.  I wonder why it did not 
> > > ...
> > 
> > I'm not sure how it can help. 
> 
> I think you misunderstood what this pass does.
> It does exactly what you think it should do:
> /* Decompose multi-word pseudo-registers into individual
>    pseudo-registers when possible and profitable.  This is possible
>    when all the uses of a multi-word register are via SUBREG, or are
>    copies of the register to another location.  Breaking apart the
>    register permits more CSE and permits better register allocation.
> 
> The only difference is the creating part which missing.

Yes but the issue is that you can't remove all the subregs since the TBX
instructions really need a 512-bit register. The slim dump for x1 =
vqtbx4q_u8(x1, table,x1):

   30: r94:XI#0=r105:V16QI
   31: r95:XI=r94:XI
      REG_DEAD r94:XI
   32: r95:XI#16=r101:V16QI
   33: r96:XI=r95:XI
      REG_DEAD r95:XI
   34: r96:XI#32=r102:V16QI
   35: r97:XI=r96:XI
      REG_DEAD r96:XI
   36: r97:XI#48=r106:V16QI
   38: r100:V16QI=unspec[r100:V16QI,r97:XI,r100:V16QI] 186
      REG_DEAD r97:XI

As you can see it creates the 512-bit XI register via a complex sequence of 4
subreg lvalues.

Reply via email to