> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Friday, July 11, 2025 4:23 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti 
> <alice.carlo...@arm.com>;
> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>;
> Wilco Dijkstra <wilco.dijks...@arm.com>
> Subject: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]
> 
> This PR is partly about a code quality regression that was triggered
> by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
> gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
> upon either (a) the original permutations not being "native" operations
> or (b) the combined permutation being a "native" operation.
> 
> Whether something is a "native" operation is tested by calling
> can_vec_perm_const_p with allow_variable_p set to false.  This requires
> the permutation to be supported directly by
> TARGET_VECTORIZE_VEC_PERM_CONST,
> rather than falling back to the general vec_perm optab.
> 
> This exposed a problem with the way that we handled general 2-input
> permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
> an instruction to do general 2-input permutations.  We do still implement
> the vec_perm optab for SVE, but only when the vector length is known at
> compile time.  The general expansion is pretty expensive: an AND, a SUB,
> two TBLs, and an ORR.  It certainly couldn't be considered a "native"
> operation.
> 
> However, if a VEC_PERM_EXPR has a constant selector, the indices can
> be wider than the elements being permuted.  This is not true for the
> vec_perm optab, where the indices and permuted elements must have the
> same precision.
> 
> This leads to one case where we cannot leave a general 2-input permutation
> to be handled by the vec_perm optab: when permuting bytes on a target
> with 2048-bit vectors.  In that case, the indices of the elements in
> the second vector are in the range [256, 511], which cannot be stored
> in a byte index.
> 
> TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
> permutations for one specific case.  Rather than check for that
> specific case, the code went ahead and used the vec_perm expansion
> whenever it worked.  But that undermines the !allow_variable_p
> handling in can_vec_perm_const_p; it becomes impossible for
> target-independent code to distinguish "native" operations from
> the worst-case fallback.
> 
> This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
> cases that it has to handle.  It fixes the PR for all vector lengths
> except 2048 bits.
> 
> A better fix would be to introduce some sort of costing mechanism,
> which would allow us to reject the new VEC_PERM_EXPR even for
> 2048-bit targets.  But that would be a significant amount of work
> and would not be backportable.
> 
> Tested on aarch64-linux-gnu.  OK to install?

Ok.

Thanks!

I'm somewhat surprised by
"aarch64_expand_sve_vec_perm does not yet handle variable-length vectors"

I assume cases that we could handle are if the permute values are
a series right? It doesn't seem like we could handle an arbitrary permute with 
VLA.

Cheers,
Tamar

> 
> Richard
> 
> 
> gcc/
>       PR target/121027
>       * config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
>       operations that can be handled by vec_perm.
> 
> gcc/testsuite/
>       PR target/121027
>       * gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
> ---
>  gcc/config/aarch64/aarch64.cc                 | 21 ++++++++++++++-----
>  .../aarch64/sve/acle/general/perm_1.c         | 14 +++++++++++++
>  2 files changed, 30 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 10b8ed5d387..6e16763f957 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -26960,12 +26960,23 @@ aarch64_evpc_tbl (struct expand_vec_perm_d
> *d)
>  static bool
>  aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
>  {
> -  unsigned HOST_WIDE_INT nelt;
> +  if (!d->one_vector_p)
> +    {
> +      /* aarch64_expand_sve_vec_perm does not yet handle variable-length
> +      vectors.  */
> +      if (!d->perm.length ().is_constant ())
> +     return false;
> 
> -  /* Permuting two variable-length vectors could overflow the
> -     index range.  */
> -  if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt))
> -    return false;
> +      /* This permutation reduces to the vec_perm optab if the elements are
> +      large enough to hold all selector indices.  Do not handle that case
> +      here, since the general TBL+SUB+TBL+ORR sequence is too expensive to
> +      be considered a "native" constant permutation.
> +
> +      Not doing this would undermine code that queries
> can_vec_perm_const_p
> +      with allow_variable_p set to false.  See PR121027.  */
> +      if (selector_fits_mode_p (d->vmode, d->perm))
> +     return false;
> +    }
> 
>    if (d->testing_p)
>      return true;
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
> new file mode 100644
> index 00000000000..6b920b8b681
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
> @@ -0,0 +1,14 @@
> +/* { dg-options "-O2 -msve-vector-bits=256" } */
> +
> +#include <arm_sve.h>
> +typedef svbfloat16_t vls_bfloat16_t __attribute__((arm_sve_vector_bits(32 *
> 8)));
> +svbfloat16_t foo(vls_bfloat16_t a, vls_bfloat16_t b)
> +{
> +  svbfloat16_t zero = svreinterpret_bf16_f32 (svdup_n_f32 (0.0f));
> +  return svzip2_bf16(zero, svuzp1_bf16(a,b));
> +}
> +
> +
> +/* { dg-final { scan-assembler-times {\tuzp1\t} 1 } } */
> +/* { dg-final { scan-assembler-times {\tzip2\t} 1 } } */
> +/* { dg-final { scan-assembler-not {\ttbl\t} } } */
> --
> 2.43.0

Reply via email to