https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:1f52396c6fc940224e9d858d49e41310a6dfa43d

commit r16-2205-g1f52396c6fc940224e9d858d49e41310a6dfa43d
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Fri Jul 11 16:48:41 2025 +0100

    aarch64: Tweak handling of general SVE permutes [PR121027]

    This PR is partly about a code quality regression that was triggered
    by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
    gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
    upon either (a) the original permutations not being "native" operations
    or (b) the combined permutation being a "native" operation.

    Whether something is a "native" operation is tested by calling
    can_vec_perm_const_p with allow_variable_p set to false.  This requires
    the permutation to be supported directly by
TARGET_VECTORIZE_VEC_PERM_CONST,
    rather than falling back to the general vec_perm optab.

    This exposed a problem with the way that we handled general 2-input
    permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
    an instruction to do general 2-input permutations.  We do still implement
    the vec_perm optab for SVE, but only when the vector length is known at
    compile time.  The general expansion is pretty expensive: an AND, a SUB,
    two TBLs, and an ORR.  It certainly couldn't be considered a "native"
    operation.

    However, if a VEC_PERM_EXPR has a constant selector, the indices can
    be wider than the elements being permuted.  This is not true for the
    vec_perm optab, where the indices and permuted elements must have the
    same precision.

    This leads to one case where we cannot leave a general 2-input permutation
    to be handled by the vec_perm optab: when permuting bytes on a target
    with 2048-bit vectors.  In that case, the indices of the elements in
    the second vector are in the range [256, 511], which cannot be stored
    in a byte index.

    TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
    permutations for one specific case.  Rather than check for that
    specific case, the code went ahead and used the vec_perm expansion
    whenever it worked.  But that undermines the !allow_variable_p
    handling in can_vec_perm_const_p; it becomes impossible for
    target-independent code to distinguish "native" operations from
    the worst-case fallback.

    This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
    cases that it has to handle.  It fixes the PR for all vector lengths
    except 2048 bits.

    A better fix would be to introduce some sort of costing mechanism,
    which would allow us to reject the new VEC_PERM_EXPR even for
    2048-bit targets.  But that would be a significant amount of work
    and would not be backportable.

    gcc/
            PR target/121027
            * config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
            operations that can be handled by vec_perm.

    gcc/testsuite/
            PR target/121027
            * gcc.target/aarch64/sve/acle/general/perm_1.c: New test.

Reply via email to