Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Friday, July 11, 2025 4:23 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti >> <alice.carlo...@arm.com>; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw >> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>; >> Wilco Dijkstra <wilco.dijks...@arm.com> >> Subject: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027] >> >> This PR is partly about a code quality regression that was triggered >> by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the >> gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional >> upon either (a) the original permutations not being "native" operations >> or (b) the combined permutation being a "native" operation. >> >> Whether something is a "native" operation is tested by calling >> can_vec_perm_const_p with allow_variable_p set to false. This requires >> the permutation to be supported directly by >> TARGET_VECTORIZE_VEC_PERM_CONST, >> rather than falling back to the general vec_perm optab. >> >> This exposed a problem with the way that we handled general 2-input >> permutations for SVE. Unlike Advanced SIMD, base SVE does not have >> an instruction to do general 2-input permutations. We do still implement >> the vec_perm optab for SVE, but only when the vector length is known at >> compile time. The general expansion is pretty expensive: an AND, a SUB, >> two TBLs, and an ORR. It certainly couldn't be considered a "native" >> operation. >> >> However, if a VEC_PERM_EXPR has a constant selector, the indices can >> be wider than the elements being permuted. This is not true for the >> vec_perm optab, where the indices and permuted elements must have the >> same precision. >> >> This leads to one case where we cannot leave a general 2-input permutation >> to be handled by the vec_perm optab: when permuting bytes on a target >> with 2048-bit vectors. In that case, the indices of the elements in >> the second vector are in the range [256, 511], which cannot be stored >> in a byte index. >> >> TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE >> permutations for one specific case. Rather than check for that >> specific case, the code went ahead and used the vec_perm expansion >> whenever it worked. But that undermines the !allow_variable_p >> handling in can_vec_perm_const_p; it becomes impossible for >> target-independent code to distinguish "native" operations from >> the worst-case fallback. >> >> This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the >> cases that it has to handle. It fixes the PR for all vector lengths >> except 2048 bits. >> >> A better fix would be to introduce some sort of costing mechanism, >> which would allow us to reject the new VEC_PERM_EXPR even for >> 2048-bit targets. But that would be a significant amount of work >> and would not be backportable. >> >> Tested on aarch64-linux-gnu. OK to install? > > Ok.
Thanks. > Thanks! > > I'm somewhat surprised by > "aarch64_expand_sve_vec_perm does not yet handle variable-length vectors" > > I assume cases that we could handle are if the permute values are > a series right? It doesn't seem like we could handle an arbitrary permute > with VLA. Yeah, but the constant selector series that we can handle are matched first, trying things like ZIP, UZP, TRN, EXT, etc. aarch64_evpc_sve_tbl is the fallback for when all other things fail. aarch64_expand_sve_vec_perm could probably be taught to handle 16-bit and wider elements for VLA. Not sure how good the code would be though. Richard