Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

Richard Sandiford Mon, 07 Jul 2025 04:46:19 -0700

Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> Hi all,
>
> To handle DImode BCAX operations we want to do them on the SIMD side only if
> the incoming arguments don't require a cross-bank move.
> This means we need to split back the combination to separate GP BIC+EOR
> instructions if the operands are expected to be in GP regs through reload.
> The split happens pre-reload if we already know that the destination will be
> a GP reg. Otherwise if reload descides to use the "=r,r" alternative we ensure
> operand 0 is early-clobber.
> This scheme is similar to how we handle the BSL operations elsewhere in
> aarch64-simd.md.
>
> Thus, for the functions:
> uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, b, 
> c); }
> uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX 
> (a, b, c); }
>
> we now generate the desired:
> bcax_d_gp:
> bic x1, x1, x2
> eor x0, x1, x0
> ret
>
> bcax_d:
> bcax v0.16b, v0.16b, v1.16b, v2.16b
> ret
>
> When the inputs are in SIMD regs we use BCAX and when they are in GP regs we
> don't force them to SIMD with extra moves.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
> Thanks,
> Kyrill
>
> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
>
> gcc/
>
>       * config/aarch64/aarch64-simd.md (*bcaxqdi4): New
>       define_insn_and_split.
>
> gcc/testsuite/
>
>       * gcc.target/aarch64/simd/bcax_d.c: Add tests for DImode arguments.
>
> From 95268cff1261a7724190dd291f9fcb5a7c817917 Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Date: Thu, 3 Jul 2025 09:45:02 -0700
> Subject: [PATCH 3/7] aarch64: Handle DImode BCAX operations
>
> To handle DImode BCAX operations we want to do them on the SIMD side only if
> the incoming arguments don't require a cross-bank move.
> This means we need to split back the combination to separate GP BIC+EOR
> instructions if the operands are expected to be in GP regs through reload.
> The split happens pre-reload if we already know that the destination will be
> a GP reg.  Otherwise if reload descides to use the "=r,r" alternative we 
> ensure
> operand 0 is early-clobber.
> This scheme is similar to how we handle the BSL operations elsewhere in
> aarch64-simd.md.
>
> Thus, for the functions:
> uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, b, 
> c); }
> uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX 
> (a, b, c); }
>
> we now generate the desired:
> bcax_d_gp:
>         bic     x1, x1, x2
>         eor     x0, x1, x0
>         ret
>
> bcax_d:
>         bcax    v0.16b, v0.16b, v1.16b, v2.16b
>         ret
>
> When the inputs are in SIMD regs we use BCAX and when they are in GP regs we
> don't force them to SIMD with extra moves.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>
>
> gcc/
>
>       * config/aarch64/aarch64-simd.md (*bcaxqdi4): New
>       define_insn_and_split.
>
> gcc/testsuite/
>
>       * gcc.target/aarch64/simd/bcax_d.c: Add tests for DImode arguments.
> ---
>  gcc/config/aarch64/aarch64-simd.md            | 29 +++++++++++++++++++
>  .../gcc.target/aarch64/simd/bcax_d.c          |  6 +++-
>  2 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 4493e55603d..be6a16b4be8 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -9252,6 +9252,35 @@
>    [(set_attr "type" "crypto_sha3")]
>  )
>  
> +(define_insn_and_split "*bcaxqdi4"
> +  [(set (match_operand:DI 0 "register_operand" "=w,&r")
> +     (xor:DI
> +       (and:DI
> +         (not:DI (match_operand:DI 3 "register_operand" "w,r"))
> +         (match_operand:DI 2 "register_operand" "w,r"))
> +       (match_operand:DI 1 "register_operand" "w,r")))]


I think the constraint on operand 1 should be "w,r0", so that we allow
operand 1 to be the same as operand 0.  Without that, and with split1
disabled/sidelined, we would end up with an extra move for:

  uint64_t f(uint64_t x0, uint64_t x1, uint64_t x2) {
    return x0 ^ (x1 & ~x2);
  }

(The only reason split1 avoids the extra move is that combine combines
the hard register copy into the *bcaxqdi4, which is a bit dubious from
an RA perspective.)

It would be good to use the new alternative syntax for multi-alternative
instructions.

OK with those changes, thanks.

Richard

> +  "TARGET_SHA3"
> +  "@
> +   bcax\t%0.16b, %1.16b, %2.16b, %3.16b
> +   #"
> +  "&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
> +  [(set (match_dup 4)
> +     (and:DI (not:DI (match_dup 3))
> +             (match_dup 2)))
> +   (set (match_dup 0)
> +     (xor:DI (match_dup 4)
> +             (match_dup 1)))]
> +  {
> +    if (reload_completed)
> +      operands[4] = operands[0];
> +    else if (can_create_pseudo_p ())
> +      operands[4] = gen_reg_rtx (DImode);
> +    else
> +      FAIL;
> +  }
> +  [(set_attr "type" "crypto_sha3,multiple")]
> +)
> +
>  ;; SM3
>  
>  (define_insn "aarch64_sm3ss1qv4si"
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/bcax_d.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/bcax_d.c
> index d68f0e102bf..a7640c3f6f1 100644
> --- a/gcc/testsuite/gcc.target/aarch64/simd/bcax_d.c
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/bcax_d.c
> @@ -7,9 +7,13 @@
>  
>  #define BCAX(x,y,z)  ((x) ^ ((y) & ~(z)))
>  
> +/* When the inputs come from GP regs don't form a BCAX.  */
> +uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, b, 
> c); }
> +
> +uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX 
> (a, b, c); }
>  uint32x2_t bcax_s (uint32x2_t a, uint32x2_t b, uint32x2_t c) { return BCAX 
> (a, b, c); }
>  uint16x4_t bcax_h (uint16x4_t a, uint16x4_t b, uint16x4_t c) { return BCAX 
> (a, b, c); }
>  uint8x8_t bcax_b (uint8x8_t a, uint8x8_t b, uint8x8_t c) { return BCAX (a, 
> b, c); }
>  
> -/* { dg-final { scan-assembler-times {bcax\tv0.16b, v0.16b, v1.16b, v2.16b} 
> 3 } } */
> +/* { dg-final { scan-assembler-times {bcax\tv0.16b, v0.16b, v1.16b, v2.16b} 
> 4 } } */

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

Reply via email to