https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>: https://gcc.gnu.org/g:25332d2325c720f584444c3858efdb85b8a3c06a commit r12-7259-g25332d2325c720f584444c3858efdb85b8a3c06a Author: Richard Sandiford <richard.sandif...@arm.com> Date: Wed Feb 16 10:21:13 2022 +0000 aarch64: Extend PR100056 patterns to + pr100056.c contains things like: int or_shift_u3a (unsigned i) { i &= 7; return i | (i << 11); } After g:96146e61cd7aee62c21c2845916ec42152918ab7, the preferred gimple representation of this is a multiplication: i_2 = i_1(D) & 7; _5 = i_2 * 2049; Expand then open-codes the multiplication back to individual shifts, but (of course) it uses + rather than | to combine the shifts. This means that we end up with the RTL equivalent of: i + (i << 11) I wondered about canonicalising the + to | (*back* to | in this case) when the operands have no set bits in common and when one of the operands is &, | or ^, but that didn't seem to be a popular idea when I asked on IRC. The feeling seemed to be that + is inherently simpler than |, so we shouldn't be âsimplifyingâ the other way. This patch therefore adjusts the PR100056 patterns to handle + as well as |, in cases where the operands are provably disjoint. For: int or_shift_u8 (unsigned char i) { return i | (i << 11); } the instructions: 2: r95:SI=zero_extend(x0:QI) REG_DEAD x0:QI 7: r98:SI=r95:SI<<0xb are combined into: (parallel [ (set (reg:SI 98) (and:SI (ashift:SI (reg:SI 0 x0 [ i ]) (const_int 11 [0xb])) (const_int 522240 [0x7f800]))) (set (reg/v:SI 95 [ i ]) (zero_extend:SI (reg:QI 0 x0 [ i ]))) ]) which fails to match, but which is then split into its individual (independent) sets. Later the zero_extend is combined with the add to get an ADD UXTB: (set (reg:SI 99) (plus:SI (zero_extend:SI (reg:QI 0 x0 [ i ])) (reg:SI 98))) This means that there is never a 3-insn combo to match the split against. The end result is therefore: ubfiz w1, w0, 11, 8 add w0, w1, w0, uxtb This is a bit redundant, since it's doing the zero_extend twice. It is at least 2 instructions though, rather than the 3 that we had before the original patch for PR100056. or_shift_u8_asm is affected similarly. The net effect is that we do still have 2 UBFIZs, but we're at least back down to 2 instructions per function, as for GCC 11. I think that's good enough for now. There are probably other instructions that should be extended to support + as well as | (e.g. the EXTR ones), but those aren't regressions and so are GCC 13 material. gcc/ PR target/100056 * config/aarch64/iterators.md (LOGICAL_OR_PLUS): New iterator. * config/aarch64/aarch64.md: Extend the PR100056 patterns to handle plus in the same way as ior, if the operands have no set bits in common. gcc/testsuite/ PR target/100056 * gcc.target/aarch64/pr100056.c: XFAIL the original UBFIZ test and instead expect two UBFIZs + two ADD UXTBs.