Segher Boessenkool <seg...@kernel.crashing.org> writes: > Hi Alex, > > On Tue, Sep 22, 2020 at 08:40:07AM +0100, Alex Coplan wrote: >> On 21/09/2020 18:35, Segher Boessenkool wrote: >> Thanks for doing this testing. The results look good, then: no code size >> changes and no build regressions. > > No *code* changes. I cannot test aarch64 likme this. > >> > So, there is no difference for most targets (I checked some targets and >> > there really is no difference). The only exception is aarch64 (which >> > the kernel calls "arm64"): the unpatched compiler ICEs! (At least three >> > times, even). >> >> Indeed, this is the intended purpose of the patch, see the PR (96998). > > You want to fix a ICE in LRA caused by an instruction created by LRA, > with a patch to combine?! That doesn't sound right. > > If what you want to do is a) fix the backend bug, and then b) get some > extra performance, then do *that*, and keep the patches separate.
This patch isn't supposed to be a performance optimisation. It's supposed to be a canonicalisation improvement. The situation as things stand is that aarch64 has a bug: it accepts an odd sign_extract representation of addresses, but doesn't accept that same odd form of address as an LEA. We have two options: (a) add back instructions that recognise the odd form of LEA, or (b) remove the code that accepts the odd addresses I think (b) is the way to go here. But doing that on its own would regress code quality. The reason we recognised the odd addresses in the first place was because that was the rtl that combine happened to generate for an important case. Normal operating procedure is to add patterns and address-matching code that accepts whatever combine happens to throw at the target, regardless of how sensible the representation is. But sometimes I think we should instead think about whether the representation that combine is using is the right one. And IMO this is one such case. At the moment combine does this: Trying 8 -> 9: 8: r98:DI=sign_extend(r92:SI) REG_DEAD r92:SI 9: [r98:DI*0x4+r96:DI]=asm_operands REG_DEAD r98:DI Failed to match this instruction: (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0) (const_int 4 [0x4])) (const_int 34 [0x22]) (const_int 0 [0])) (reg/f:DI 96)) [3 *i_5+0 S4 A32]) (asm_operands:SI ("") ("=Q") 0 [] [] [] /tmp/foo.c:13)) allowing combination of insns 8 and 9 original costs 4 + 4 = 8 replacement cost 4 and so that's one of the forms that the aarch64 address code accepts. But a natural substitution would simply replace r98 with the rhs of the set: (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92)) (const_int 4)) (reg:DI 96))) ...) The only reason we don't do that is because the substitution and simplification go through the expand_compound_operation/ make_compound_operation process. The corresponding (ashift ... (const_int 2)) *does* end up using the natural sign_extend form rather than sign_extract form. The only reason we get this (IMO) weird behaviour for mult is the rule that shifts have to be mults in addresses. Some code (like the code being patched) instead expects ashift to be the canonical form in all situations. If we make the substitution work “naturally” for mult as well as ashift, we can remove the addressing-matching code that has no corresponding LEA pattern, and make the aarch64 address code self-consistent that way instead. Thanks, Richard