> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Monday, July 7, 2025 12:55 PM > To: Kyrylo Tkachov <ktkac...@nvidia.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Earnshaw > <richard.earns...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew > Pinski <pins...@gmail.com> > Subject: Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations > > Richard Sandiford <richard.sandif...@arm.com> writes: > > Kyrylo Tkachov <ktkac...@nvidia.com> writes: > >> Hi all, > >> > >> To handle DImode BCAX operations we want to do them on the SIMD side only > if > >> the incoming arguments don't require a cross-bank move. > >> This means we need to split back the combination to separate GP BIC+EOR > >> instructions if the operands are expected to be in GP regs through reload. > >> The split happens pre-reload if we already know that the destination will > >> be > >> a GP reg. Otherwise if reload descides to use the "=r,r" alternative we > >> ensure > >> operand 0 is early-clobber. > >> This scheme is similar to how we handle the BSL operations elsewhere in > >> aarch64-simd.md. > >> > >> Thus, for the functions: > >> uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, > >> b, c); } > >> uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX > >> (a, > b, c); } > >> > >> we now generate the desired: > >> bcax_d_gp: > >> bic x1, x1, x2 > >> eor x0, x1, x0 > >> ret > >> > >> bcax_d: > >> bcax v0.16b, v0.16b, v1.16b, v2.16b > >> ret > >> > >> When the inputs are in SIMD regs we use BCAX and when they are in GP regs > >> we > >> don't force them to SIMD with extra moves. > >> > >> Bootstrapped and tested on aarch64-none-linux-gnu. > >> Ok for trunk? > >> Thanks, > >> Kyrill > >> > >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > >> > >> gcc/ > >> > >> * config/aarch64/aarch64-simd.md (*bcaxqdi4): New > >> define_insn_and_split. > >> > >> gcc/testsuite/ > >> > >> * gcc.target/aarch64/simd/bcax_d.c: Add tests for DImode arguments. > >> > >> From 95268cff1261a7724190dd291f9fcb5a7c817917 Mon Sep 17 00:00:00 > 2001 > >> From: Kyrylo Tkachov <ktkac...@nvidia.com> > >> Date: Thu, 3 Jul 2025 09:45:02 -0700 > >> Subject: [PATCH 3/7] aarch64: Handle DImode BCAX operations > >> > >> To handle DImode BCAX operations we want to do them on the SIMD side only > if > >> the incoming arguments don't require a cross-bank move. > >> This means we need to split back the combination to separate GP BIC+EOR > >> instructions if the operands are expected to be in GP regs through reload. > >> The split happens pre-reload if we already know that the destination will > >> be > >> a GP reg. Otherwise if reload descides to use the "=r,r" alternative we > >> ensure > >> operand 0 is early-clobber. > >> This scheme is similar to how we handle the BSL operations elsewhere in > >> aarch64-simd.md. > >> > >> Thus, for the functions: > >> uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, > >> b, c); } > >> uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX > >> (a, > b, c); } > >> > >> we now generate the desired: > >> bcax_d_gp: > >> bic x1, x1, x2 > >> eor x0, x1, x0 > >> ret > >> > >> bcax_d: > >> bcax v0.16b, v0.16b, v1.16b, v2.16b > >> ret > >> > >> When the inputs are in SIMD regs we use BCAX and when they are in GP regs > >> we > >> don't force them to SIMD with extra moves. > >> > >> Bootstrapped and tested on aarch64-none-linux-gnu. > >> > >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > >> > >> gcc/ > >> > >> * config/aarch64/aarch64-simd.md (*bcaxqdi4): New > >> define_insn_and_split. > >> > >> gcc/testsuite/ > >> > >> * gcc.target/aarch64/simd/bcax_d.c: Add tests for DImode arguments. > >> --- > >> gcc/config/aarch64/aarch64-simd.md | 29 +++++++++++++++++++ > >> .../gcc.target/aarch64/simd/bcax_d.c | 6 +++- > >> 2 files changed, 34 insertions(+), 1 deletion(-) > >> > >> diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > >> index 4493e55603d..be6a16b4be8 100644 > >> --- a/gcc/config/aarch64/aarch64-simd.md > >> +++ b/gcc/config/aarch64/aarch64-simd.md > >> @@ -9252,6 +9252,35 @@ > >> [(set_attr "type" "crypto_sha3")] > >> ) > >> > >> +(define_insn_and_split "*bcaxqdi4" > >> + [(set (match_operand:DI 0 "register_operand" "=w,&r") > >> + (xor:DI > >> + (and:DI > >> + (not:DI (match_operand:DI 3 "register_operand" "w,r")) > >> + (match_operand:DI 2 "register_operand" "w,r")) > >> + (match_operand:DI 1 "register_operand" "w,r")))] > > > > I think the constraint on operand 1 should be "w,r0", so that we allow > > operand 1 to be the same as operand 0. Without that, and with split1 > > disabled/sidelined, we would end up with an extra move for: > > > > uint64_t f(uint64_t x0, uint64_t x1, uint64_t x2) { > > return x0 ^ (x1 & ~x2); > > } > > > > (The only reason split1 avoids the extra move is that combine combines > > the hard register copy into the *bcaxqdi4, which is a bit dubious from > > an RA perspective.) > > Sigh. Wrong way round, of course: it's operands 2 and 3 that can be "w,r0". >
Question for my own understanding. From an RA perspective can the tie end up with the same cost as the r? I was wondering whether w,0r or w,r0 makes a difference. Thanks, Tamar > Richard