https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69461

--- Comment #9 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Alexandre Oliva from comment #6)
> Created attachment 37498 [details]
> Patch I'm testing to fix the bug
> 
> LRA wants harder than reload to avoid creating a stack slot to satisfy insn
> constraints.  As a result, it creates an additional REG:TI pseudo to reload
> a SUBREG:V2DF of a REG:TI, and then it tries to assign that pseudo to
> VSX_REGS, which in turn causes another reload because there's no way to load
> a TImode value into a VSX_REG in *mov<mode>_ppc64, and that requires
> another, and so on, until the limit on reload insns is exceeded.
> 
> The first problem is that we shouldn't be creating a TImode reload for
> VSX_REGS, since we can't possibly satisfy that: TImode values are not ok for
> VSX_REGS.  I've adjusted in_class_p to check HARD_REGNO_MODE_OK, and that
> put an end to infinite stream of reloads.
> 
> It was still a very long stream, though.  simplify_operand_subreg attempts
> to turn SUBREGs of MEMs into MEMs, but it will only proceed with the
> simplification if the resulting address is at least as valid as the
> original.  
> 
> Alas, instead of the simplification, we end up repeatedly generating reloads
> copying the initial value to stack slots with growing offsets, until the
> offset grows enough that the address becomes invalid, at which point the
> subreg simplification is performed.  That's 2047 excess stores and loads,
> plus insns that compute the stack address for each of them.
> 
> In order to fix that, I amended the test on whether to proceed with the
> subreg simplification to take into account the availability of regs that can
> hold a value of the intended mode in the goal class for that operand.
> 
> With that, we go from 2047 excess stores and loads to only 1.  I couldn't
> figure out yet how to get rid of this one extra store and load, and the
> excess stack slot, but I figured I'd share what I have, that I believe to be
> a solid fix, and save the investigation on an additional LRA improvement for
> later.

Alex, thanks for working on this.  The location of the fix is right.  But I
have a smaller fix, more general fix.  Instead of looking at the operand, i
propose to look that address will be valid when we use just base reg (LRA can
do this transformation later).  Original address is valid as TImode permits
such address (reg + offset).  For V2DFmode it is invalid as the mode permits
only reg[+reg] address, therefore we fail to remove the subreg.  Adding check
that just base address is valid, resolves the issue.

I'll test my patch on several machines and commit with your test if everything
is ok.

Reply via email to