Hi Segher, Gentle ping.
Is the combine change (a canonicalization fix, as described below) OK for trunk in light of this info? On 22/09/2020 17:08, Richard Sandiford wrote: > Segher Boessenkool <seg...@kernel.crashing.org> writes: > > Hi Alex, > > > > On Tue, Sep 22, 2020 at 08:40:07AM +0100, Alex Coplan wrote: > >> On 21/09/2020 18:35, Segher Boessenkool wrote: > >> Thanks for doing this testing. The results look good, then: no code size > >> changes and no build regressions. > > > > No *code* changes. I cannot test aarch64 likme this. > > > >> > So, there is no difference for most targets (I checked some targets and > >> > there really is no difference). The only exception is aarch64 (which > >> > the kernel calls "arm64"): the unpatched compiler ICEs! (At least three > >> > times, even). > >> > >> Indeed, this is the intended purpose of the patch, see the PR (96998). > > > > You want to fix a ICE in LRA caused by an instruction created by LRA, > > with a patch to combine?! That doesn't sound right. > > > > If what you want to do is a) fix the backend bug, and then b) get some > > extra performance, then do *that*, and keep the patches separate. > > This patch isn't supposed to be a performance optimisation. It's supposed > to be a canonicalisation improvement. > > The situation as things stand is that aarch64 has a bug: it accepts > an odd sign_extract representation of addresses, but doesn't accept > that same odd form of address as an LEA. We have two options: > > (a) add back instructions that recognise the odd form of LEA, or > (b) remove the code that accepts the odd addresses > > I think (b) is the way to go here. But doing that on its own > would regress code quality. The reason we recognised the odd > addresses in the first place was because that was the rtl that > combine happened to generate for an important case. > > Normal operating procedure is to add patterns and address-matching > code that accepts whatever combine happens to throw at the target, > regardless of how sensible the representation is. But sometimes I think > we should instead think about whether the representation that combine is > using is the right one. And IMO this is one such case. > > At the moment combine does this: > > Trying 8 -> 9: > 8: r98:DI=sign_extend(r92:SI) > REG_DEAD r92:SI > 9: [r98:DI*0x4+r96:DI]=asm_operands > REG_DEAD r98:DI > Failed to match this instruction: > (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g > ]) 0) > (const_int 4 [0x4])) > (const_int 34 [0x22]) > (const_int 0 [0])) > (reg/f:DI 96)) [3 *i_5+0 S4 A32]) > (asm_operands:SI ("") ("=Q") 0 [] > [] > [] /tmp/foo.c:13)) > allowing combination of insns 8 and 9 > original costs 4 + 4 = 8 > replacement cost 4 > > and so that's one of the forms that the aarch64 address code accepts. > But a natural substitution would simply replace r98 with the rhs of > the set: > > (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92)) > (const_int 4)) > (reg:DI 96))) > ...) > > The only reason we don't do that is because the substitution > and simplification go through the expand_compound_operation/ > make_compound_operation process. > > The corresponding (ashift ... (const_int 2)) *does* end up using > the natural sign_extend form rather than sign_extract form. > The only reason we get this (IMO) weird behaviour for mult is > the rule that shifts have to be mults in addresses. Some code > (like the code being patched) instead expects ashift to be the > canonical form in all situations. > > If we make the substitution work “naturally” for mult as well as > ashift, we can remove the addressing-matching code that has no > corresponding LEA pattern, and make the aarch64 address code > self-consistent that way instead. > > Thanks, > Richard Thanks, Alex