https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121027
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>: https://gcc.gnu.org/g:1f52396c6fc940224e9d858d49e41310a6dfa43d commit r16-2205-g1f52396c6fc940224e9d858d49e41310a6dfa43d Author: Richard Sandiford <richard.sandif...@arm.com> Date: Fri Jul 11 16:48:41 2025 +0100 aarch64: Tweak handling of general SVE permutes [PR121027] This PR is partly about a code quality regression that was triggered by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional upon either (a) the original permutations not being "native" operations or (b) the combined permutation being a "native" operation. Whether something is a "native" operation is tested by calling can_vec_perm_const_p with allow_variable_p set to false. This requires the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST, rather than falling back to the general vec_perm optab. This exposed a problem with the way that we handled general 2-input permutations for SVE. Unlike Advanced SIMD, base SVE does not have an instruction to do general 2-input permutations. We do still implement the vec_perm optab for SVE, but only when the vector length is known at compile time. The general expansion is pretty expensive: an AND, a SUB, two TBLs, and an ORR. It certainly couldn't be considered a "native" operation. However, if a VEC_PERM_EXPR has a constant selector, the indices can be wider than the elements being permuted. This is not true for the vec_perm optab, where the indices and permuted elements must have the same precision. This leads to one case where we cannot leave a general 2-input permutation to be handled by the vec_perm optab: when permuting bytes on a target with 2048-bit vectors. In that case, the indices of the elements in the second vector are in the range [256, 511], which cannot be stored in a byte index. TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE permutations for one specific case. Rather than check for that specific case, the code went ahead and used the vec_perm expansion whenever it worked. But that undermines the !allow_variable_p handling in can_vec_perm_const_p; it becomes impossible for target-independent code to distinguish "native" operations from the worst-case fallback. This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the cases that it has to handle. It fixes the PR for all vector lengths except 2048 bits. A better fix would be to introduce some sort of costing mechanism, which would allow us to reject the new VEC_PERM_EXPR even for 2048-bit targets. But that would be a significant amount of work and would not be backportable. gcc/ PR target/121027 * config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input operations that can be handled by vec_perm. gcc/testsuite/ PR target/121027 * gcc.target/aarch64/sve/acle/general/perm_1.c: New test.