<dhr...@nvidia.com> writes:
> [...]
> +;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a 
> "bswap".
> +;; Match that as well.
> +(define_insn_and_split "*v_revvnx8hi"
> +  [(parallel
> +    [(set (match_operand:VNx8HI 0 "register_operand")
> +       (bswap:VNx8HI (match_operand 1 "register_operand")))
> +     (clobber (match_scratch:VNx8BI 2))])]

Sorry for not noticing last time, but operand 0 should have a "=w"
constraint, operand 1 should have a "w" constraint, and the match_scratch
should have a "=Upl" constraint.

> +  "TARGET_SVE"
> +  "#"
> +  ""

The last line should be "&& 1", since the TARGET_SVE test doesn't
automatically apply to the define_split.

> +  [(set (match_dup 0)
> +     (unspec:VNx8HI
> +       [(match_dup 2)
> +        (unspec:VNx8HI
> +          [(match_dup 1)]
> +          UNSPEC_REVB)]
> +       UNSPEC_PRED_X))]
> +  {
> +    if (!can_create_pseudo_p ())
> +      operands[2] = CONSTM1_RTX (VNx8BImode);
> +    else
> +      operands[2] = aarch64_ptrue_reg (VNx8BImode);

This should be:

    if (!can_create_pseudo_p ())
      emit_move_insn (operands[2], CONSTM1_RTX (VNx8BImode));
    else
      operands[2] = aarch64_ptrue_reg (VNx8BImode);

That is, after register allocation, the pattern gives us a scratch
predicate register, but we need to initialise it to a ptrue.

> +  }
> +)
> +
>  ;; Predicated integer unary operations.
>  (define_insn "@aarch64_pred_<optab><mode>"
>    [(set (match_operand:SVE_FULL_I 0 "register_operand")
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
> new file mode 100644
> index 00000000000..3a30f80d152
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
> @@ -0,0 +1,83 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+sve" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <arm_sve.h>
> +
> +/*
> +** ror32_sve_lsl_imm:
> +**   ptrue   p3.b, all
> +**   revw    z0.d, p3/m, z0.d

There's no requirement to choose p3 for the predicate, so this would
be better as:

**      ptrue   (p[0-3]).b, all
**      revw    z0.d, \1/m, z0.d

Same for the others.

OK with those changes, thanks.

Richard

Reply via email to