Richard Sandiford <richard.sandif...@arm.com> writes: > TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1 > "hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions. > This matching was conditional on !BYTES_BIG_ENDIAN. > > The ACLE code also lowered the associated SVE2.1 intrinsics into > suitable VEC_PERM_EXPRs. This lowering was not conditional on > !BYTES_BIG_ENDIAN. > > The mismatch led to lots of ICEs in the ACLE tests on big-endian > targets: we lowered to VEC_PERM_EXPRs that are not supported. > > I think the !BYTES_BIG_ENDIAN restriction was unnecessary. > SVE maps the first memory element to the least significant end of > the register for both endiannesses, so no endian correction or lane > number adjustment is necessary. > > This is in some ways a bit counterintuitive. ZIPQ1 is conceptually > "apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does > matter when choosing between Advanced SIMD ZIP1 and ZIP2. For example, > the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little- > endian and ZIP2 for big-endian. But the difference between the hybrid > VLA and Advanced SIMD permute selectors is a consequence of the > difference between the SVE and Advanced SIMD element orders. > > The same thing applies to ACLE intrinsics. The current lowering of > svzipq1 etc. is correct for both endiannesses. If ACLE code does: > > 2x svld1_s32 + svzipq1_s32 + svst1_s32 > > then the byte-for-byte result is the same for both endiannesses. > On big-endian targets, this is different from using the Advanced SIMD > sequence below for each 128-bit block: > > 2x LDR + ZIP1 + STR > > In contrast, the byte-for-byte result of: > > 2x svld1q_gather_s32 + svzipq1_s32 + svst11_scatter_s32 > > depends on endianness, since the quadword gathers and scatters use > Advanced SIMD byte ordering for each 128-bit block. This gather/scatter > sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR > sequence for both endiannesses. > > Programmers writing ACLE code have to be aware of this difference > if they want to support both endiannesses. > > The patch includes some new execution tests to verify the expansion > of the VEC_PERM_EXPRs. > > Tested on aarch64-linux-gnu and aarch64_be-elf. OK to install? > > Richard > > > gcc/ > * doc/sourcebuild.texi (aarch64_sve2_hw, aarch64_sve2p1_hw): Document. > * config/aarch64/aarch64.cc (aarch64_evpc_hvla): Extend to > BYTES_BIG_ENDIAN. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_aarch64_sve2p1_hw): > New proc. > * gcc.target/aarch64/sve2/dupq_1.c: Extend to big-endian. Add > noipa attributes. > * gcc.target/aarch64/sve2/extq_1.c: Likewise. > * gcc.target/aarch64/sve2/uzpq_1.c: Likewise. > * gcc.target/aarch64/sve2/zipq_1.c: Likewise.
Just noticed that I failed to add nopia to the other files -- will fix. > * gcc.target/aarch64/sve2/dupq_1_run.c: New test. > * gcc.target/aarch64/sve2/extq_1_run.c: Likewise. > * gcc.target/aarch64/sve2/uzpq_1_run.c: Likewise. > * gcc.target/aarch64/sve2/zipq_1_run.c: Likewise.