11 Regression] aarch64: ICE during RTL pass: reload

rsandifo at gcc dot gnu.org Thu, 27 Aug 2020 04:45:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796


rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |rsandifo at gcc dot 
gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #6 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to Alex Coplan from comment #5)
> Started with this change:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;
> h=8eaff6ef97836100801f7b40dc03f77fbebe03ac
Ah, yeah.  What the patch does looks good, but it seems to be
exposing a latent problem with subreg reloads.

The cycling starts with:

----------------------------------------------------------------------------
Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
      Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287
      Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
      REG_DEAD r196:DI
    Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0
----------------------------------------------------------------------------

where we now (IMO justifiably) have two reload moves, one for the
memory load and one for the subreg.  Next we have:

----------------------------------------------------------------------------
Changing pseudo 196 in operand 3 of insn 103 on equiv [r105:DI*0x8+r140:DI]
         Reuse r287 for reload [r105:DI*0x8+r140:DI], change to class
POINTER_AND_FP_REGS for r287
         Reuse r288 for reload r287:DI#0, change to class POINTER_AND_FP_REGS
for r288
            1 Non pseudo reload: reject++
            3 Non pseudo reload: reject++
          alt=0,overall=2,losers=0,rld_nregs=0
         Choosing alt 0 in insn 103:  (0) =r  (1) r  (2) n  (3) r {*add_lsl_si}
      Change to class GENERAL_REGS for r288
----------------------------------------------------------------------------

POINTER_AND_FP_REGS is the class that aarch64 prefers for the reload,
again IMO justifiably.  This then gets narrowed to GENERAL_REGS for
the main reload register (r288) because of the use in the *add_lsl_si
instruction.  But we're then left with a situation in which r287 has
class POINTER_AND_FP_REGS and is only used in moves.  In practice,
each move alternative will require either POINTER_REGS or FP_REGS,
but there's nothing to pin r287 down to a particular one, and we end
up oscillating between them.

More specifically, we reload insn 316 as follows:

----------------------------------------------------------------------------
         Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
      Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
    Inserting insn reload after:
  318: r287:DI=r289:DI
----------------------------------------------------------------------------

Here we've effectively chosen to use GENERAL_REGS for the r287 reload,
but made the choice via a new reload register (r289).  Next we do:

----------------------------------------------------------------------------
         Choosing alt 13 in insn 318:  (0) w  (1) rZ {*movdi_aarch64}
      Creating newreg=290 from oldreg=287, assigning class FP_REGS to r290
  318: r290:DI=r289:DI
    Inserting insn reload after:
  319: r287:DI=r290:DI
----------------------------------------------------------------------------

Here we've eschewed the r<-r alternative because of the risk of cycling,
so this time we've effectively chosen to use FP_REGS for r287 (instead
of GENERAL_REGS as above).  This choice too is made via a new reload
register (r290).  We manage to break a potential cycle here, but we've
still left r287 as POINTER_AND_FP_REGS.

Next we move on to the second of the original two reload instructions:

----------------------------------------------------------------------------
         Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
      Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
    Inserting insn reload before:
  320: r291:SI=r287:DI#0
----------------------------------------------------------------------------

Here too we've rejected r<-r because of potential cycling, and
so have effectively chosen to put r287 in FP_REGS.  The “problem”
is that this time we've reloaded the subreg input rather than the
register output, and so we have the same problem when reloading
the subreg the next time round.

IMO the handling of the first reload shows that it would be better
to restrict the class of r287 rather than generate a new reload
register r289.  Doing that might then require a reload in the uses
of r287, but that might happen anyway, since the new class would
still be a subset of the old class, and so any register chosen
for the new class could also have been chosen for the old class.
At least we'd be making forward progress by restricting the class,
and we'd avoid unnmecessary moves via the FP register bank.

I'm testing a patch.

[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload

Reply via email to