Hi Segher,

Gentle ping.

Is the combine change (a canonicalization fix, as described below) OK
for trunk in light of this info?

On 22/09/2020 17:08, Richard Sandiford wrote:
> Segher Boessenkool <seg...@kernel.crashing.org> writes:
> > Hi Alex,
> >
> > On Tue, Sep 22, 2020 at 08:40:07AM +0100, Alex Coplan wrote:
> >> On 21/09/2020 18:35, Segher Boessenkool wrote:
> >> Thanks for doing this testing. The results look good, then: no code size
> >> changes and no build regressions.
> >
> > No *code* changes.  I cannot test aarch64 likme this.
> >
> >> > So, there is no difference for most targets (I checked some targets and
> >> > there really is no difference).  The only exception is aarch64 (which
> >> > the kernel calls "arm64"): the unpatched compiler ICEs!  (At least three
> >> > times, even).
> >> 
> >> Indeed, this is the intended purpose of the patch, see the PR (96998).
> >
> > You want to fix a ICE in LRA caused by an instruction created by LRA,
> > with a patch to combine?!  That doesn't sound right.
> >
> > If what you want to do is a) fix the backend bug, and then b) get some
> > extra performance, then do *that*, and keep the patches separate.
> 
> This patch isn't supposed to be a performance optimisation.  It's supposed
> to be a canonicalisation improvement.
> 
> The situation as things stand is that aarch64 has a bug: it accepts
> an odd sign_extract representation of addresses, but doesn't accept
> that same odd form of address as an LEA.  We have two options:
> 
> (a) add back instructions that recognise the odd form of LEA, or
> (b) remove the code that accepts the odd addresses
> 
> I think (b) is the way to go here.  But doing that on its own
> would regress code quality.  The reason we recognised the odd
> addresses in the first place was because that was the rtl that
> combine happened to generate for an important case.
> 
> Normal operating procedure is to add patterns and address-matching
> code that accepts whatever combine happens to throw at the target,
> regardless of how sensible the representation is.  But sometimes I think
> we should instead think about whether the representation that combine is
> using is the right one.  And IMO this is one such case.
> 
> At the moment combine does this:
> 
> Trying 8 -> 9:
>     8: r98:DI=sign_extend(r92:SI)
>       REG_DEAD r92:SI
>     9: [r98:DI*0x4+r96:DI]=asm_operands
>       REG_DEAD r98:DI
> Failed to match this instruction:
> (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g 
> ]) 0)
>                     (const_int 4 [0x4]))
>                 (const_int 34 [0x22])
>                 (const_int 0 [0]))
>             (reg/f:DI 96)) [3 *i_5+0 S4 A32])
>     (asm_operands:SI ("") ("=Q") 0 []
>          []
>          [] /tmp/foo.c:13))
> allowing combination of insns 8 and 9
> original costs 4 + 4 = 8
> replacement cost 4
> 
> and so that's one of the forms that the aarch64 address code accepts.
> But a natural substitution would simply replace r98 with the rhs of
> the set:
> 
>   (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92))
>                                  (const_int 4))
>                         (reg:DI 96)))
>        ...)
> 
> The only reason we don't do that is because the substitution
> and simplification go through the expand_compound_operation/
> make_compound_operation process.
> 
> The corresponding (ashift ... (const_int 2)) *does* end up using
> the natural sign_extend form rather than sign_extract form.
> The only reason we get this (IMO) weird behaviour for mult is
> the rule that shifts have to be mults in addresses.  Some code
> (like the code being patched) instead expects ashift to be the
> canonical form in all situations.
> 
> If we make the substitution work “naturally” for mult as well as
> ashift, we can remove the addressing-matching code that has no
> corresponding LEA pattern, and make the aarch64 address code
> self-consistent that way instead.
> 
> Thanks,
> Richard

Thanks,
Alex

Reply via email to