> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Monday, July 7, 2025 12:55 PM
> To: Kyrylo Tkachov <ktkac...@nvidia.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Earnshaw
> <richard.earns...@arm.com>; Alex Coplan <alex.cop...@arm.com>; Andrew
> Pinski <pins...@gmail.com>
> Subject: Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations
> 
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> >> Hi all,
> >>
> >> To handle DImode BCAX operations we want to do them on the SIMD side only
> if
> >> the incoming arguments don't require a cross-bank move.
> >> This means we need to split back the combination to separate GP BIC+EOR
> >> instructions if the operands are expected to be in GP regs through reload.
> >> The split happens pre-reload if we already know that the destination will 
> >> be
> >> a GP reg. Otherwise if reload descides to use the "=r,r" alternative we 
> >> ensure
> >> operand 0 is early-clobber.
> >> This scheme is similar to how we handle the BSL operations elsewhere in
> >> aarch64-simd.md.
> >>
> >> Thus, for the functions:
> >> uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, 
> >> b, c); }
> >> uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX 
> >> (a,
> b, c); }
> >>
> >> we now generate the desired:
> >> bcax_d_gp:
> >> bic x1, x1, x2
> >> eor x0, x1, x0
> >> ret
> >>
> >> bcax_d:
> >> bcax v0.16b, v0.16b, v1.16b, v2.16b
> >> ret
> >>
> >> When the inputs are in SIMD regs we use BCAX and when they are in GP regs 
> >> we
> >> don't force them to SIMD with extra moves.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Ok for trunk?
> >> Thanks,
> >> Kyrill
> >>
> >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
> >>
> >> gcc/
> >>
> >>    * config/aarch64/aarch64-simd.md (*bcaxqdi4): New
> >>    define_insn_and_split.
> >>
> >> gcc/testsuite/
> >>
> >>    * gcc.target/aarch64/simd/bcax_d.c: Add tests for DImode arguments.
> >>
> >> From 95268cff1261a7724190dd291f9fcb5a7c817917 Mon Sep 17 00:00:00
> 2001
> >> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> >> Date: Thu, 3 Jul 2025 09:45:02 -0700
> >> Subject: [PATCH 3/7] aarch64: Handle DImode BCAX operations
> >>
> >> To handle DImode BCAX operations we want to do them on the SIMD side only
> if
> >> the incoming arguments don't require a cross-bank move.
> >> This means we need to split back the combination to separate GP BIC+EOR
> >> instructions if the operands are expected to be in GP regs through reload.
> >> The split happens pre-reload if we already know that the destination will 
> >> be
> >> a GP reg.  Otherwise if reload descides to use the "=r,r" alternative we 
> >> ensure
> >> operand 0 is early-clobber.
> >> This scheme is similar to how we handle the BSL operations elsewhere in
> >> aarch64-simd.md.
> >>
> >> Thus, for the functions:
> >> uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, 
> >> b, c); }
> >> uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX 
> >> (a,
> b, c); }
> >>
> >> we now generate the desired:
> >> bcax_d_gp:
> >>         bic     x1, x1, x2
> >>         eor     x0, x1, x0
> >>         ret
> >>
> >> bcax_d:
> >>         bcax    v0.16b, v0.16b, v1.16b, v2.16b
> >>         ret
> >>
> >> When the inputs are in SIMD regs we use BCAX and when they are in GP regs 
> >> we
> >> don't force them to SIMD with extra moves.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >>
> >> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
> >>
> >> gcc/
> >>
> >>    * config/aarch64/aarch64-simd.md (*bcaxqdi4): New
> >>    define_insn_and_split.
> >>
> >> gcc/testsuite/
> >>
> >>    * gcc.target/aarch64/simd/bcax_d.c: Add tests for DImode arguments.
> >> ---
> >>  gcc/config/aarch64/aarch64-simd.md            | 29 +++++++++++++++++++
> >>  .../gcc.target/aarch64/simd/bcax_d.c          |  6 +++-
> >>  2 files changed, 34 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> >> index 4493e55603d..be6a16b4be8 100644
> >> --- a/gcc/config/aarch64/aarch64-simd.md
> >> +++ b/gcc/config/aarch64/aarch64-simd.md
> >> @@ -9252,6 +9252,35 @@
> >>    [(set_attr "type" "crypto_sha3")]
> >>  )
> >>
> >> +(define_insn_and_split "*bcaxqdi4"
> >> +  [(set (match_operand:DI 0 "register_operand" "=w,&r")
> >> +  (xor:DI
> >> +    (and:DI
> >> +      (not:DI (match_operand:DI 3 "register_operand" "w,r"))
> >> +      (match_operand:DI 2 "register_operand" "w,r"))
> >> +    (match_operand:DI 1 "register_operand" "w,r")))]
> >
> > I think the constraint on operand 1 should be "w,r0", so that we allow
> > operand 1 to be the same as operand 0.  Without that, and with split1
> > disabled/sidelined, we would end up with an extra move for:
> >
> >   uint64_t f(uint64_t x0, uint64_t x1, uint64_t x2) {
> >     return x0 ^ (x1 & ~x2);
> >   }
> >
> > (The only reason split1 avoids the extra move is that combine combines
> > the hard register copy into the *bcaxqdi4, which is a bit dubious from
> > an RA perspective.)
> 
> Sigh.  Wrong way round, of course: it's operands 2 and 3 that can be "w,r0".
> 

Question for my own understanding. From an RA perspective can the tie end up
with the same cost as the r? I was wondering whether w,0r or w,r0 makes a 
difference.

Thanks,
Tamar

> Richard

Reply via email to