Re: [PATCH] aarch64: Extend HVLA permutations to big-endian

Richard Sandiford Wed, 09 Jul 2025 08:00:30 -0700

Richard Sandiford <richard.sandif...@arm.com> writes:
> TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
> "hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
> This matching was conditional on !BYTES_BIG_ENDIAN.
>
> The ACLE code also lowered the associated SVE2.1 intrinsics into
> suitable VEC_PERM_EXPRs.  This lowering was not conditional on
> !BYTES_BIG_ENDIAN.
>
> The mismatch led to lots of ICEs in the ACLE tests on big-endian
> targets: we lowered to VEC_PERM_EXPRs that are not supported.
>
> I think the !BYTES_BIG_ENDIAN restriction was unnecessary.
> SVE maps the first memory element to the least significant end of
> the register for both endiannesses, so no endian correction or lane
> number adjustment is necessary.
>
> This is in some ways a bit counterintuitive.  ZIPQ1 is conceptually
> "apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does
> matter when choosing between Advanced SIMD ZIP1 and ZIP2.  For example,
> the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little-
> endian and ZIP2 for big-endian.  But the difference between the hybrid
> VLA and Advanced SIMD permute selectors is a consequence of the
> difference between the SVE and Advanced SIMD element orders.
>
> The same thing applies to ACLE intrinsics.  The current lowering of
> svzipq1 etc. is correct for both endiannesses.  If ACLE code does:
>
>   2x svld1_s32 + svzipq1_s32 + svst1_s32
>
> then the byte-for-byte result is the same for both endiannesses.
> On big-endian targets, this is different from using the Advanced SIMD
> sequence below for each 128-bit block:
>
>   2x LDR + ZIP1 + STR
>
> In contrast, the byte-for-byte result of:
>
>   2x svld1q_gather_s32 + svzipq1_s32 + svst11_scatter_s32
>
> depends on endianness, since the quadword gathers and scatters use
> Advanced SIMD byte ordering for each 128-bit block.  This gather/scatter
> sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR
> sequence for both endiannesses.
>
> Programmers writing ACLE code have to be aware of this difference
> if they want to support both endiannesses.
>
> The patch includes some new execution tests to verify the expansion
> of the VEC_PERM_EXPRs.
>
> Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?
>
> Richard
>
>
> gcc/
>       * doc/sourcebuild.texi (aarch64_sve2_hw, aarch64_sve2p1_hw): Document.
>       * config/aarch64/aarch64.cc (aarch64_evpc_hvla): Extend to
>       BYTES_BIG_ENDIAN.
>
> gcc/testsuite/
>       * lib/target-supports.exp (check_effective_target_aarch64_sve2p1_hw):
>       New proc.
>       * gcc.target/aarch64/sve2/dupq_1.c: Extend to big-endian.  Add
>       noipa attributes.
>       * gcc.target/aarch64/sve2/extq_1.c: Likewise.
>       * gcc.target/aarch64/sve2/uzpq_1.c: Likewise.
>       * gcc.target/aarch64/sve2/zipq_1.c: Likewise.


Just noticed that I failed to add nopia to the other files -- will fix.

>       * gcc.target/aarch64/sve2/dupq_1_run.c: New test.
>       * gcc.target/aarch64/sve2/extq_1_run.c: Likewise.
>       * gcc.target/aarch64/sve2/uzpq_1_run.c: Likewise.
>       * gcc.target/aarch64/sve2/zipq_1_run.c: Likewise.

Re: [PATCH] aarch64: Extend HVLA permutations to big-endian

Reply via email to