Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Friday, July 11, 2025 4:23 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti 
>> <alice.carlo...@arm.com>;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
>> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>;
>> Wilco Dijkstra <wilco.dijks...@arm.com>
>> Subject: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]
>> 
>> This PR is partly about a code quality regression that was triggered
>> by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
>> gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
>> upon either (a) the original permutations not being "native" operations
>> or (b) the combined permutation being a "native" operation.
>> 
>> Whether something is a "native" operation is tested by calling
>> can_vec_perm_const_p with allow_variable_p set to false.  This requires
>> the permutation to be supported directly by
>> TARGET_VECTORIZE_VEC_PERM_CONST,
>> rather than falling back to the general vec_perm optab.
>> 
>> This exposed a problem with the way that we handled general 2-input
>> permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
>> an instruction to do general 2-input permutations.  We do still implement
>> the vec_perm optab for SVE, but only when the vector length is known at
>> compile time.  The general expansion is pretty expensive: an AND, a SUB,
>> two TBLs, and an ORR.  It certainly couldn't be considered a "native"
>> operation.
>> 
>> However, if a VEC_PERM_EXPR has a constant selector, the indices can
>> be wider than the elements being permuted.  This is not true for the
>> vec_perm optab, where the indices and permuted elements must have the
>> same precision.
>> 
>> This leads to one case where we cannot leave a general 2-input permutation
>> to be handled by the vec_perm optab: when permuting bytes on a target
>> with 2048-bit vectors.  In that case, the indices of the elements in
>> the second vector are in the range [256, 511], which cannot be stored
>> in a byte index.
>> 
>> TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
>> permutations for one specific case.  Rather than check for that
>> specific case, the code went ahead and used the vec_perm expansion
>> whenever it worked.  But that undermines the !allow_variable_p
>> handling in can_vec_perm_const_p; it becomes impossible for
>> target-independent code to distinguish "native" operations from
>> the worst-case fallback.
>> 
>> This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
>> cases that it has to handle.  It fixes the PR for all vector lengths
>> except 2048 bits.
>> 
>> A better fix would be to introduce some sort of costing mechanism,
>> which would allow us to reject the new VEC_PERM_EXPR even for
>> 2048-bit targets.  But that would be a significant amount of work
>> and would not be backportable.
>> 
>> Tested on aarch64-linux-gnu.  OK to install?
>
> Ok.

Thanks.

> Thanks!
>
> I'm somewhat surprised by
> "aarch64_expand_sve_vec_perm does not yet handle variable-length vectors"
>
> I assume cases that we could handle are if the permute values are
> a series right? It doesn't seem like we could handle an arbitrary permute 
> with VLA.

Yeah, but the constant selector series that we can handle are matched
first, trying things like ZIP, UZP, TRN, EXT, etc.  aarch64_evpc_sve_tbl
is the fallback for when all other things fail.

aarch64_expand_sve_vec_perm could probably be taught to handle 16-bit
and wider elements for VLA.  Not sure how good the code would be though.

Richard

Reply via email to